Thursday, 23 June 2011

Web Analytics - Intro to multi variate testing (MVT)

In my previous post, I've talked about A/B testing and in a future post, I'll cover what multi variate testing is. This post is an interim between the two; last week my wife had our second child, so blogging time is a little hard to come by at the moment!


In this brief post I'd like to list a few things that MVT is not.  There seem to be various ideas about it, most of them described as a panacea for all online woes.


It isn't having more than two versions of a test image on a page; that's just A/B/C/n testing


It isn't really about simultaneously optimising different parts of a page, either.  In its purest form, MVT is about measuring and studying how changes to multiple areas of a page affect conversion, including the interactions between the parts that are changing.  It's the collective sum of all the parts of the page that contribute; optimising each individual component may lead to reduced performance for the page as a whole.  Taking these interactions into account, for me, is the difference between MVT and just running multiple A/B tests on one page.  I'll cover this in more detail in my 'proper' post next time.


MVT isn't, by itself, the cure-all for a poor customer experience either.  Setting up test versions of pages on a website won't provide long term help to a website, in the same way as a quick blast of keyword optimisation won't fix a poor Google ranking.  MVT is a long-term process, and it's prone, as all computer-related activities are, to the Garbage In Garbage Out problem.  If you don't think about the testing, and develop a proper testing program, then you won't learn anything or improve anything for yourself or for your site visitors.


Apologies that this post is so short, and brief; think of it as a trailer or a primer for my next post!

Monday, 6 June 2011

Web Analytics: A/B testing - A Beginning

In my last post on iterative testing, I gave an example which was based on sequential testing.  With sequential testing, various creatives, text, headlines, images or whichever are deployed to a website for a period of time, one after another, and the relative performance of each is compared when all of them have been tried.  I will, in future, discuss success metrics for these kinds of tests, but for now, in my previous posts on developing tests and building hypotheses, I've been attributing each example with a points score (it's the principle that matters).

There are numerous drawbacks with running sequential tests.  The audience varies for each time period, and you can't guarantee that you'll have the same kind of traffic for each of the tests.  Most importantly, though, there are various external factors that will influence the performance of each creative - for example, if one creative is running while you or a competitor is running an offline campaign, then its performance will be affected unfairly (for better or worse).  There are influences such as sports events, current affairs, the weather (I've seen this ones), school holidays or national holidays that affect website traffic and audience behaviour.

Fortunately, in the online environment, it's possible to test two or more creatives against each other at the same time.  Taking a simple example with two creatives, we can test them by randomly splitting your traffic into two groups and then showing one group (group A) the first creative and the other group the other creative.  In the example below, one version is red, and other is blue.


The important aspect of any kind of online testing is that if a visitor comes back to the site, they have to be shown the same colour that they saw before.  This is not only to ensure consistent visitor experience, but to make sure that the test remains valid.  You don't want to confuse a visitor by showing him a red page and then, on a later visit, a blue one, and you won't know which version to credit with the success if he responds to one version or the other.

So, having set up two different versions of a creative (red and blue), and then splitting your visitors into two (at work we use Omniture's Test and Target, while on my own website I used Google Website Optimiser), you need to set up a success criterion.  How can you tell which version (red or blue) is the better of the two?  Are you going to measure click-through-rate?  Or conversion to another success event, such as starting a checkout process, or completing the checkout event?  There's some debate about this, but I have two personal opinions:

1.  Choose a success event that's close to the page with the creative on it.  Personally, I strongly recommend measuring click-through rate, but keeping an eye on conversion to other, later success events.  The effect of red versus blue is almost certain to be diluted as you go further from the test page through to the success event.  After all, if your checkout process is five pages long, then the effect of the red versus blue creative is going to be overtaken by the effect of what product the visitor has added to the cart, and other influences such as how efficient your checkout screens are.  Yes, keep an eye on conversion through the checkout process, and make sure that the creative with the higher click-through-rate doesn't have a much lower conversion to the 'big' success events, but my view is to measure the results that are most likely to be directly influenced by your test.

2.  Choose one success criterion and stick to it.  If you do have more than one measure of success, then decide which is most important, and rank the rest in order of priority.  I can imagine nothing more frustrating than completing a test, and presenting the results back to the great and the good, and saying, "The red version had the higher click through and the higher conversion to 'add to cart' so we've rated it the winner'", just for somebody to say, "Ah, but this one had the higher average order value, and so we should go with this one."  Perhaps not the perfect example, but you can see that choosing, and agreeing on, the success criteria is very important.  Otherwise, you've gone from having one person's opinion on the best creative for a web page to one person's opinion on the best success event for a test - and it may be coincidental that this person's opinion  on the key success metric leads to their opinion on the most successful creative.

One comment I would make is that it's possible to test three, four or five versions of an image or creative at the same time, and this would still be called A/B testing.  Technically, it'd probably be called A/B/C or A/B/C/D/E testing - usually, the shortcut is A/B/n testing, where n can be any number that suits.

In my next post, I intend to write about multi-variate testing, which for me is really where science collides with web analytics.  I'll explain how the results from A/B testing can be used as the basis for further testing, referring to my previous post on iterative testing, so that you can see in more detail what's working on a web page and what isn't, and what the key factors are.