top of page
Writer's pictureFahad H

The Ultimate Web Analytics Data Reconciliation Checklist


Ideally you should only have one web analytics tool on your website.

If you have nothing and you are starting out then sure have a few different ones, stress test them, pick the one you love (just like in real life!), but then practice monogamy.

At the heart of that recommendation is a painful lesson I have learned: It is a long hard slog to convert an organization to be truly data driven.

And that's with one tool.

Having two tools just complicates life in many subtle and sub optimal ways. One "switch" commonly occurs is the shift from fighting the good fight of getting the organization to use data to bickering about data not matching, having to do multiple set of coding for campaigns (the page tagging work itself is trivial) and so on and so forth.

In a nutshell the efforts become all about data and not the quest for insights.

So if you can help it, have one tool. Bigamy atleast in this case is undesirable. [If that does not convince you remember the magnificent 10/90 rule from May 2006 when I was but a naïve web analytics Manager.]

But.

Pontification aside the reality is that many people run more than one tool on their website (though hopefully they are all on their way to picking the best of the lot). That means the bane of every Analyst's existence: Data reconciliation!

clicktracks statcounter google analytics

It is a thankless task, takes way more time then needed and the "game" is so rigged that 1] it is nearly impossible to get to a conclusion and 2] it is rarely rewarding – i.e. worth it.

But reconcile we must. So in this post I want to share my personal checklist of things I look for when going through a data reconciliation exercise. Usually this helps get things to within 95% and then I give up. It is so totally not worth it to get the rest!

* This post is a bit technical, but a Marketer should be able to understand it. And my goal is you smile three times while you read it, either from the inside jokes or from the sheer pain on display! *

So if you are starting a data reconciliation project for your web analytics tools, make sure you check for these things:


#1: Comparing Web Logs vs. JavaScript driven tools. Don't.

#2: First & Third Party Cookies. The gift that keeps giving!

#3: Imprecise website tagging.

#4: Torture Your Vendor: Check Definitions of Key Metrics.

#5: Sessionization. One Tough Nut.

#6: URL Parameter Configuration. The Permanent Tripwire.

#7: Campaign Parameter Configuration. The Problem of the Big.

#8: Data Sampling. The Hidden "Angel".

#9: Order of the Tags. Love it, Hate it, Happens.

Intrigued? Got your cup of coffee or beer? Ready to become sexycool?

Let's deep dive. . . .

#1: Comparing Web Logs vs. JavaScript driven tools. Don't.

I know, I know, you all get it. Yes you understand that this is not just comparing apples and oranges but more like comparing apples and monkeys.

For the five of us that are not in that camp: these two methods of collecting data are very different, the processing and storage is different, the things that impact each are very different.

omniture vs webtrends

So if you are using these two methods then know that your numbers might often not even come close (by that I mean within 85 – 90%).

The primary things that web logs have to deal with are effective and extensive filtering of robots (if you are not doing this you are screwed regardless), the definition of unique visitor (are you using cookies? just IP? IP + User Agent ids?) and, this is increasingly minor, but data caching (at a browser or server level) can also mean missing data from logs.

There is also the Very Important matter of Rich Media content: Flash, Video, Flex, Apps whatever. Without extensive custom coding your weblogs are clueless about all your rich media experience (time spent, interactions etc). Most Tag based solutions now come with easy to implement solutions that will track rich media. So if you have a rich media site know that that will cause lots of differences between numbers you get from logs and numbers you get from tags.

The primary things that afflict javascript tags, in this context (more later), are browsers that have javascript turned off (2-3% typical) and in that case will have their data missing from tag based files.

Be careful when you try to compare these two sources.

#2: First & Third Party Cookies. The gift that keeps giving!

Notice the sarcasm there? : )

It turns out that if you use first party cookies or third party cookies can have a huge impact on your metrics. Metrics like Unique Visitors, Returning Visits etc.

So check that.

Typically if you are 3rd party then your numbers will be higher (and of course wrong), compared to numbers from your 1st party cookie based tool.

Cookie flushing (clearing cookies upon closing browser or by your friendly "anti spyware" tool) affects both the same way.

Cookie rejection is more complex. Many new browsers don't even accept 3rd party cookies (bad). Some users set their browsers to not accept any cookies, which hurts both types the same.

We should have been done away with this a long time ago but many vendors (including paid!) continue to use third party as default. I was just talking to a customer of OmniCore yesterday and they just finished implementation (eight months!!) and were using third party cookies.

I wanted to pull my hair out.

There are rare exceptions where you should use 3rd party cookies. But unless you know what you are doing, demand first party cookies. If free web analytics tools now offer only first party cookies standard there is no reason for you not to use them.

End of soap box.

Check type of cookies, it will explain lots of your data differences.

#3: Imprecise website tagging.

Other than cookies I think this is your next BFF in data recon'ing.

Most of us use javascript tag based solutions. In case of web log files the server atleast collects the minimum data without much work because that is just built into web servers.

In case of javascript solutions, sadly, we are involved. We the people!

The problem manifests itself in two ways.

Incorrectly implemented tags:

The standard javascript tags are pretty easy to implement. Copy / paste and happy birthday.

But then you can add / adjust / caress them to do more things (now you know why it takes 8 months to implement). You can pass sprops and evars and user_defined_values and variables and bacteria.

You should make sure your WebTrends / Google Analytics / IndexTools / Unica are implemented correctly i.e. passing data back to the vendor as you expect. Else of course woe be on you!

To check that you have implemented the tags right, and the sprops are not passing evars and that user defined values are not sleeping with the vars, I like using tools like IEWatch Professional. [I am not affiliated with them in any way.]

[Update: From my friend Jennifer, if you are really really into this stuff, 3 more: Firebug, Web Developer Toolkit & Web Bug.]

Your tech person can use it and validate and assure you that the various tools implemented are passing correct data.

Incompletely implemented tags:

This one's simple. Your IT department (or brother) implemented Omniture tags on some pages and Google Analytics on most pages. Well you have a problem.

Actually this is usually the culprit in a majority of the cases. Make sure you implement both tools on all the same pages (if not all the pages on the site).

Mercifully your tech person (or dare I say you!) can use some affordable tools to check for this.

You would have noticed in my book Web Analytics: An Hour A Day I recommended REL Software's Web Link Validator. I continue to like it. Of course WASP, from our good friend Stephane, did not exist then and I am quite fond of it as well.

If you want to have a faster reconciliation between your tools, make sure you have implemented all your analytics tools correctly and completely.

#4: Torture Your Vendor: Check Definitions of Key Metrics.

Perhaps you noticed in the very first image that StatCounter, ClickTracks and Google Analytics were showing three completely different numbers for Feb. But notice that they also all give that metric a different name.

Visits. Visitors. Unique Visitors. For the same metrics, "sessions".

How exasperating!

As an industry we have grown organically and each vendor has hence created their own metrics or at other times taken standard metrics, and just to mess with us, decided to call them something else.

Here, honest to God, are three definitions of conversion rate I have gotten from web analytics vendors:

Conversion = Orders / Unique Visitors Conversion = Orders / Visits Conversion = Items Ordered / Clicks

What! Items Ordered / Clicks? Oh, the Humanity!

So before you tar and feather a particular web analytics tool (or worse listen to the vendors talking points) and decide which is better, torture them to understand exactly what the precise definition is of the metric you are comparing.

It can be hard.

Early in my career (just a few years ago, I am not that old!) I called the top vendor and tried to get the definition of Unique Visitor. What I saw on the screen was Daily Unique Visitor. I wanted to know if it was the same as, my tool, ClickTracks's definition (which was count of distinct persistent cookie ids for whatever time period choosen).

The VP's answer: "What do you want to measure? We can do it for you."

Me: "I am looking at Unique Visitors in CT for this month. I am looking at Unique Visitors for that month in OmniCoreTrends, I see a number, it does not tie."

VP: "We can measure Monthly Unique Visitors for you and add it to your account."

Me: "What if I want to compare Unique Visitors for a week?"

VP: "We can add Weekly Unique Visitors to your account."

Me: (Getting impressed at the savviness at stone walling) "What your definition of Unique Visitors?"

VP: "What is it that you need to measure? We can add it to you account."

You have to give her / him this: they are very good at their job. But as a user my experience was bad.

Even if a metric has the same name between the tools check with the vendor. It is possible you are comparing apples and pineapples.

Torture your vendor.

#5. Sessionization. One Tough Nut.

You can think of this as a unique case of a metric's definition but it is just so important that I wanted to pull it out separately.

"Sessions" are important because they essentially measure the metric we know as Visit (or Visitors).

But taking your clicks and converting that into a session on the website can be very different with each vendor.

Some vendors will time out session after 29 minutes of inactivity. Some will do that after 15 mins. Which means right there you could be looking at the number 1 in visits or the number 2.

Here's another place where how a vendor does sessionization could be a problem:

data reconciliation-sessionization issues[1]

The fact that someone went to a search engine and came back to your site "breaks" the first session and starts another in one tool, but not the other.

One last thing, check the "max session timeout" settings between the tools. Some might have a hard limit of 30 mins, others have one (in using a top paid tool I found Visits that lasted 1140 mins or 2160 mins – visitors went to the site, left the page open, came back to work, clicked and kept browsing, or came back after the weekend).

Imagine what it does to Average Time on Site!

Probe this important process because it affects the most foundational of all metric (Visits or Visitors – Yes they are the same one, aarrrrhh!).

#6. URL Parameter Configuration. The Permanent Tripwire.

Life was so sweet when all the sites were static. URL's were simple:

http://www.bestbuy.com/video/hot_hot_hottie_hot.html

It was easy for any web analytics tool to understand visits to that page and hence count page views.

The problem is that the web became dynamic and urls for web pages now look like this:

or

or

The problem is that while web analytics tools have gotten better and can probably understand that first page (phone category page), it is not quite as straight forward for the next two.

They contain "tracking parameters" or "system parameters" (crap from the server) or other junk. Different pieces of information, some worth ignoring and others you ignore at your own peril.

Your web analytics tool has a hard time taking all these pieces and painting the right portrait (or count the page views correctly).

So what you have to do is sit down with your beloved IT folks and first you spend time documenting what all the junk in the url is. Things like skuId, productCategoryId, type, tab, id.

Some of these make a web page unique, like say skuId, productCategoryId and tab. I.E. their presence and values contained mean its a unique page. So skuId=8793861 means one phone and skuId=8824739 is another.

But there will be some that don't mean anything. For example it does not matter if type=product is in the URL or not.

Here's your To Do: Go teach your web analytics tool which parameters to use and which to ignore.

And here's how that part looks like for Google Analytics. . . .

url query configuration

In ClickTracks it is called "masking and unmasking" parameters. In Omniture its called josephine. I kid, I kid. :)

If you don't do this then each tool will try to make their own guesses. Which means they'll do it imprecisely. Which means they won't tie. Much worse they'll be living in the land of "truthiness"!

And make sure you do the same configuration in both the tools! That will get you going in terms of ensuring that the all important Page Views metric will be correct (or atleast less inaccurate).

I won't touch on it here but if you are using Event Logging for Web 2.0 / rich media experiences it adds more pain.

Or if you are generating fake page views to do various things like tracking form submissions or to track outbound links (boo Google Analytics!) or other such stuff then do that the same way between tools.

Just be aware of that. By doing the right config for your URL parameters in your web analytics tool you are ensuring accurate count of your page views, and across all the tools you are comparing. Well worth investing some effort for this cause.

#7. Campaign Parameter Configuration. The Problem of the Big.

Ok maybe all of us run campaigns. But the "big" do this a lot more.

If you run lots of campaigns (Email, Affiliates, Paid Search, Display, Mobile, etc) then it is very important that you tag your campaigns correctly and then go configure your web analytics tools correctly to ensure your campaigns are reported correct, your referrers are reported correctly, your revenue and conversions are attributed correctly.

Here is a simple example.

If you search for Omniture in Yahoo:

omniture on yahoo

[To the person running Omniture's paid search campaigns: I realize Omniture is important but you might reconsider mentioning your company's name twice! I have seen this ad on Yahoo for months (yes I search for Omniture that much!) . :)]

You end up here:

If you search for Omniture on Google, you end up here:

You'll note that Omniture's done a great job of tagging their campaigns. Absolutely lovely. Now. . . .

Let's say Omniture is using WebTrends and IndexTools on their website to do web analytics. Then they would have to go into each of those tools and "teach" them all campaign parameters they are using, the hierarchies and what not.

That will ensure that when they click on Paid Search tab / button / link in the tool that these campaigns will be reported correctly.

You'll have to repeat this for your affiliate and email and display and all other things you are doing.

If you have two tools you'll have to do it twice. And each tool might not accept this data in the same way. For WebTrends you might have to place it in the URL stem, in IndexTools you might have to put it in the cookies, in Google Analytics it might have to be a customized javascript.

Suffice it to say not a walk in the park. (Now you'll understand why clean campaign tracking is the hardest thing to do, see link immediately below.)

#8. Data Sampling. The Hidden "Angel".

This is a problem (see "angel" :)) that many people are not aware of, and under estimate in terms of its impact.

But I want to emphasize that it will, usually, only impact large to larger companies.

There will be more about sampling at the link at the end of this section. But in a nutshell there are two kinds of sampling in web analytics.

Data Sampling at Source:

Web Analytics is getting to be very expensive if you are a site of a decent size.

If you are decent size (or plus some) then a typical strategy from the paid web analytics vendor is not to collect all your data – because your web analytics bill is based on page views you send over.

So you don't tag all your pages or you tag all your pages but they only store a sample of data.

This can cause a data reconciliation issue.

Data Sampling at "Run Time":

In this case all the data is collected (by your free or paid tool) but when you run your reports / queries it will be sampled to make it run fast.

Sometimes you have the control over the sampling (like in ClickTracks) and at other times not quite (like in Omniture Discover or WebTrends Marketing Lab etc) and at other times still no control at all (like in Google Analytics).

Sampling at "run time" is always better because you have all the data (should you be that paranoid).

But as you can imagine depending on the tool you are using data sampling can greatly impact the Key Performance Indicators you are using. This means all / none / some of your data will not reconcile.

So investigate this, most vendors are not as transparent about this as they should be, push 'em.

#9. Order of the Tags. Love it, Hate it, Happens.

This, being the last one, is not the hugest of deals. But on heavily trafficked websites, or ones that are just heavy (can sites be obese?), this can also affect the differences in the data.

As your web page starts to load the tags are the last thing to load (a very good thing, always have your tags just above the [/body] tags, please). If you have more than one tag then they get executed in the order they are implemented.

Sometimes on fat pages some of the tags might just not get executed.

It happens because the user has already clicked. It happens because you have custom hacked the bejesus out of the tag and it is now a obese tag, and does not let the other, Heidi Klum type sexy and lean tags load in the time available.

If you want that last amount of extra checking, switch the order of the tags and see if it helps. It might help explain the last percent of difference you are dying to get. :)

That's it! We are done!! Well I am. :)

I suspect you'll understand a lot better why I recommend having just one tool (after rigorous evaluation of many tools) and then actually spending time creating a data driven organization.

Ok its your turn now.

What did I miss? What are other things you have discovered that can cause data discrepancies between tools? Which one of the above nine is your absolute favorite? Cookies? URL's? Page Parameters?

Please share your own delightful experiences, insights and help me and our community.

Thank you.

PS: In case you are in the process of considering a web analytics tool, here is my, truly comprehensive (more than you ever wanted to know) guide through the process:

and a bonus

0 views0 comments

Comments


bottom of page