The Nagging Little Question Of Conversion Optimization

Fahad H
May 7, 2015
6 min read

The drop in conversion rate was sudden and deep. After months of testing, we had seen a steady increase in the number of leads that one website generated. Then, in one day, all of our progress had evaporated.

The Conversion Scientist on the account saw the drop. He may have soiled his lab coat. Usually, such a drop can be traced to a break in the analytics tracking code. Then the client called, wondering what had happened to their robust lead flow.

This was real.

When we tracked down the cause of this catastrophe, we found the answer to a question that is hard to answer for most optimizers.

The Nagging Question Of Conversion Optimization

The best metric for measuring the success of a website optimization project is the ASF, or Accountant Smile Frequency. If the online business is growing and marketing costs aren’t, then the people watching such things smile.

But there is always a nagging question in their mind: “How much of this gain was from optimization, and how much was from natural market forces?”

This is an impossible question to answer definitively. Conversion optimization doesn’t happen in a vacuum, though we would prefer that.

So, how do we know optimization is really making a difference?

There are three kinds of lies: lies, damned lies, and statistics.

You Don’t Get All Of Your Wins

Whenever your site is being optimized, you will see a series of reports as tests come to maturity:

“Ten percent gain!”

“Inconclusive test.”

“Six percent gain.”

“Wow, a twenty nine percent gain!”

If you add up all of these wins, you would expect to see a 45% increase in revenue (a game-changer for any business). But, you don‘t. In your before and after analysis, you might see a 10%, 15% or 20% increase in actual conversion rate when the changes are rolled out to the site.

Did your optimization team lie to you?

The Massey Observer Effect

In physics, the Observer Effect states that, “The act of observation will alter the state of what is being measured.”

I’ve created a corollary for the Web that states, “The act of measuring an audience will change the way the audience behaves.” The tools we use to measure a website change the results in subtle ways. When we stop measuring, the visitors’ behaviors revert to the previous state.

For example, let’s say that we are changing the text in a headline on a landing page. Our testing tool shows the original headline to the first visitor. Then it loads the page for the second visitor and instantly changes the headline to something new. Visitors coming with slow or overburdened browsers can sense this almost immediate change.

This subtly draws a visitor’s attention to the very thing that we are testing. Someone who would not normally study the headline will pay more attention. Over a large sample size, this can have a meaningful impact on the results, artificially swaying the test to favor the new headline.

If the new headline flash was responsible for ten percent of a twenty percent lift, we would only see a ten percent increase when the new headline was rolled out and the flash went away.

There are any number of ways a measurement tool will influence visitors. It could be changes in load time. The tool could affect the execution of code native to the page. Browser plugins, like ad blockers, could be affected.

The bottom line is that optimization tools may introduce an error greater than the statistics report.

Unfortunately, your visitors don’t remember Statistics 101 class, either.

Statistics Lie

Mark Twain popularized the well-known saying, “There are three kinds of lies: lies, damned lies, and statistics.”

When we do a split test, we are taking a sample of the visitors and trying to statistically predict their behavior. We are using the methods you have forgotten from your Statistics 101 class to become oracles of visitor behavior.

When we conclude a test that shows us a fifteen percent gain over a certain period of time, we could say that we have a statistical probability that all of the visitors will convert at a fifteen percent higher rate over all times. Unfortunately, your visitors don’t remember Statistics 101 class either. They don’t follow the rules perfectly.

It isn’t hard to see the limitations of statistical estimates. When we first engage with a client, we usually do what is called an A-A test to shake out their implementation. We split test two identical pages. Nothing changes. Yet, we never see the same result from each. One may convert at 2.1% while the other, identical page may convert at 1.9%.

There is no such thing as an average visitor.

While this may seem like a small amount at first glance, it represents a 10.5% artificial lift for the higher-performing treatment. In this case, we would be suspicious of our setup. A difference of one percent or two percent would be considered normal.

The two pages saw similar samples of visitors, offered the exact same experience, and yet came to different conclusions about how visitors at large would behave.

We Test On Segments Of The Traffic

When testing, we never test against a sample of visitors that represents the entire population. If we are testing in the cart, we are only testing a segment of visitors that show high intent to buy. If we are testing on the home page, we are testing a different flavor of visitor than those that enter through a product page or blog post. None of these visitors is like the average visitor, because there is no such thing as an average visitor.

Your website is a collection of segments.

We routinely focus our testing tools on mobile visitors, returning visitors, and paid search visitors. When we find an increase for these segments, it is unlikely that we would see an equivalent increase for all visitors to the site.

Coincident Events

Sometimes, things happen that affect a site’s performance. The market changes. Email campaigns go out. Changes in PPC ads and bidding can change the shape of the traffic. New products are released. All of this can drive conversion rates higher independent of the efforts of your optimization team.

If these things happen during a test, our results may be wrong. We may choose a winner that only worked during the World Cup. If one of these coincident events begins at the same time we launch a test, it can look like one of our ideas failed, when under normal circumstances it would have been a winner.

If we tested during the Promotion period, we would have chosen the wrong winner. If we had tested during the Email campaign, our gain would have been exaggerated.

We usually discover these maddening events in post-test analysis. Results may suddenly change during a test. Sometimes, coincident events are only revealed when we roll out a winning change and don’t see a lift in conversion rate.

We actively work to eliminate these effects. We end tests before promotions. We exclude email traffic to reduce the effects of promotional blasts. We retest wins that we find during peak periods — like the holiday shopping season — to see if they work in off-peak periods.

Coincidence is not a close friend of science.

The Light Switch Test

If all of these issues make testing seem like a crap shoot, let me assure you that it’s not. This was proved when we were confronted with a radical drop in conversion rate.

Let’s pick the story up again.

Our client had seen a sudden, deep drop in leads overnight. This wasn’t a change in traffic. The conversion rate accompanied the drop in leads. No Google algorithm change to blame here.

As it turns out, this was an accidental “Light Switch” test. To do this type of test, you have to be able to revert all of your changes back to one of your controls. You then turn your changes off to see what happens.

When you’re testing, the change in performance — the lead conversion rate or revenue per session — tells you how your changes have affected the bottom line. Then you turn them all back on, expecting a return to the optimized performance.

It’s like flipping a light switch off and on to see what light(s) it controls.

This is a test we would never recommend. It can be very expensive, and its only use is to validate the optimization efforts. You don’t learn anything new from it. In this case, it occurred inadvertently. Here’s how it happened.

This client had been slow to update their site, but they wanted to harvest the leads at the higher conversion rates we’d tested our way to. So, we simply used the testing tool and our JavaScript stack to morph their site in every visitor’s browser.

You can see where I’m going.

During routine maintenance, the script for the testing tool was removed from the site. Immediately, all of our changes had been removed. The light switch had been turned to “off,” and the site was almost exactly like it had been months before.

The conversion rate dropped to almost the exact level it had been before we started testing, and the drop matched the lift that we had claimed from our analysis.

It was several hours before the light was turned back on. As expected, the conversion rate returned to normal.

This kind of validation is rare.

There is no better way to improve the performance of a website than a sustained and disciplined conversion optimization program. However, it’s important to know that your results are colored by your tools, by statistical aberrations, through segmentation, and by unfortunate coincidences.