top of page
Writer's pictureFahad H

Web Data Quality: A 6 Step Process To Evolve Your Mental Model


It seems absolutely dumb to argue that while the quality of data used to make decisions is important, it is actually not that important to have the highest data quality.

Generations of Analysts, Data "People", Decision Makers have grown up with the principle of GIGO. Garbage in, garbage out.

It made a lot of sense for a very long time. Especially because we used to collect so little data, its lack of even a little quality crapified the decision a lot.

GIGO also fueled our every expanding quest for data perfection and data quality. There are entire companies built around helping your "clean up" your data. Especially if you look at the offline traditional business intelligence, erp, crm, data warehouse worlds.

The web unfortunately threw a big spanner into the works.

Couple important reasons.

First, it is important to realize that we collect a lot of data on the web (type of data, elements of data, what not).

Second, our beloved world wide web, remember still a little baby, is imperfect at every turn. We use data collection methodologies that reflect our efforts to do the best we can, but they are inherently flawed. Just take javascript as an example. It is good at what it does. But not everyone has javascript turned on (typically around 2-3%). Zing: imperfection.

A lot of data. Imperfect data collection system.

Here is the most common result of this challenge: The "Director of Analytics" spends her meager resources in the futile quest for clean data.

Money is spent on consultants (especially the "scarady cats" who deftly stir this issue to favor their personal businesses). Everyone tries to reconcile everything across systems and logs. Omniture gets kicked out and WebTrends gets put in, supposedly for it "far superior" data quality (!!).

Makes me sad.

In the debate for perfect data is is important to realize that the reality is a lot more nuanced.

No Possible Complete Data on Le Web.

I humbly believe that the world of data perfection ("clean auditable data") does not exist any more. It did for a long time because life was cleaner, mistakes were human made, sources were fewer and there wasn't enough data to begin with (sure terabytes of it, but of what 300 fields? 600?).

On the web we now have too many sources of data. Quantatitive, qualitative, hearsay (sorry, surveys :), competitive intelligence, and so much. [Web Analytics 2.0 ] But these sources are "fragile".

Sometimes because of technology (tags / cookies / panels / ISP logs). Sometimes because of privacy reasons. Sometimes because we can't sample enough (surveys, usability tests). Sometimes because it is all so new, we don't even know what the heck we are doing and the world is changing too fast around us.

Killing the Holy Cows.

The old people who did BI (me for sure, maybe you?) and moved to the web have had to come to the realization that the old rules of making decisions are out of the door. Not just because that mental model of what now counts for "data" means but also because what counts for "decisions" has changed, the pace at which those decisions need to be made have changed. It took companies a long time to die in the past. That process happens at "web speed" now.

Given all that if I don't change, I'll become a hurdle to progress. If I don't change, I can't help my company make the kind of progress it should.

You need to fundamentally rewire your brain, like I have had do rewire mine (it was painful): The data is not complete and clean, yet it is more data of more type and it contains immense actionable insights.

If you would only get over yourself a little bit.

So how to do this if you really do want to be God's gift to web analysis?

Based on my own personal evolution in this space I recommend you going through the following six step cleansing process to ensure that you are doing this right, and you move beyond the deeply counter productive data obsession.

1) Follow best practices to collect data, don't do stupid stuff.

2) Audit your data periodically to ensure you are collecting as complete a data set as possible (and as accurately as possible, #1).

3) Only collect as much data as you need: There is no upper limit to the amount of data you can collect and store on the web.

4) Ditch the old mental model of Accuracy, go for Precision (more here: Accuracy, Precision & Predictive Analytics). It might seem astonishing but your analysis will actually get more accurate if you go for precision.

5) Be comfortable, I mean really freaking comfortable, with incompleteness and learn to make decisions.

6) [In context of decision making] It used to be Think Smart, Move Fast. I think the next generation of true Analysis Ninjas will: Move Fast, Think Smart. Remember there is an opportunity cost associated with the quest for perfection.

Web Data Quality Cycle

An example of #1 is if you are using third party cookies in your web analytics tool like Omniture or CoreMetrics or WebTrends etc then you deserve the crappy data you are getting. For #2 use various website scanning tools for ensuring complete implementation, each vendor has their own, just ask. #3 is the reason more attempts to data warehouse web analytics data end up as massive expensive failures, or why you then get trapped constantly "mowing the grass".

You are not going to believe me but in #4 if you actually go for precision your analysis will actually get more accurate over time (whoa!).

#5 is the hardest thing for Analysts (and for many Marketers) to accept. Especially those that have doing data analysis in other fields. They are simply not comfortable with 90% complete data. Or even 95%. They work really really hard to get the other 5% because without that they are unable to accept that they could make business recommendations. Sometimes this is because of how their mental model is. Sometimes is is because the company is risk averse (not the Analyst's fault). Sometimes it is out of a genuine, if misplaced, desire to give the prefect answer.

Of course the net result is that lots of data collection, processing and perfection exercises happen. The business is starved for any insights to make even the most mundane decisions. I have had to layoff Analysts who simply could not accept incompleteness and had to have data that was clean and complete. Very hard for me to do.

#6 is a huge challenge because it requires an experience that most of us don't possess. Of having been there. Because of working in companies that plug us into the tribal knowledge and context. Because we work in massively multi layered bureaucracies in large companies. In my heart of heart I believe, sadly, that it will take a new generation of Analysts and a new generation of leaders in companies. Still we must try, even as I accept the criticism that the 10/90 rule is not followed and that we don't have enough Smart Analyst.

So: Best practices that collect as complete a data set as possible precisely allowing you to look beyond the incompleteness resulting you in moving fast while thinking smart.

Before You Jump All Over Me and Yell: Heretic!

Notice what I am not saying.

I am not saying make wrong decisions.

I am not saying accept bad data.

I am not saying don't do your damdest to make sure your data is as clean as it can be.

What I am saying is that your job does not depend on data with 100% integrity on the web. Your job depends on helping your company Move Fast and Think Smart.

I am also not saying it is easy.

Reality Check:

We live in the most data rich channel in the universe, we should be using data to find insights, no matter how a little bit off the perfect number they might be.

Just consider this.

How do you measure the effectiveness of your magazine ad? Now compare that to the data you have from doubleclick. How about measuring the ability of your TV ad to reach the right audience? Compare that with measuring reach through Paid Search (or Affiliate Marketing or …..). Do you think you get better data from Neilsen's TV panel of between 15k – 30k US residents to represent the diversity of TV content consumption of 200 million tv watching Americans?

faith based initiatives

There is simply no comparison. So why waste our life trying to get perfect data from our web sites and online marketing campaigns? Why does unsound, incomplete, and faith based data from TV, Magazines, Radio get a pass? Why be so harsh to your web channel? Just because you can collect data here means you won't do anything because it is imperfect?

Parting Words of Wisdom:

Stuart Gold is a VP at Omniture. Here's a quote from him:

"An educated mistake is better than no action at all."

Brilliant.

The web allows you to make educated mistakes. Fast. With each mistake you become smarter. With each mistake your next step becomes more intelligent.

Make educated mistakes.

EOM.

Ok now its your turn.

What do you think of the web data quality issue? What are the flawed assumptions I have made in making my recommendation above? How do you ensure your data is as complete and as precise as it can be? Got tools or horror stories to share? What is the next data collection mechanism on the horizon that will be our salvation on the web?

I look forward to your comments and feedback. Thanks.

PS: Couple other related posts you might find interesting:





0 views0 comments

Comments


bottom of page