Web Optimisation, Maths and Puzzles: September 2014

I've written at length about multi-variate tests. I've discussed barriers, complexity and design, and each time, I've concluded by saying that I would write an article about how to analyse the results from a multi variate test. This is that article.

I'm going to use the example I set up last time: testing the components of a banner to optimise its effectiveness. The success metric has been decided and it's click-through rate (for the sake of argument).

There are three components that are going to be tested:
- should the picture in the banner be a man or a woman?
- should the text in the banner say "On Sale!" or "Buy now!"
- should the text be black or red?

Here are a few example recipes from my previous post on MVT.

Recipe 1	Recipe 2
Recipe 3	Recipe 4

Recipe selection and test plan

When there are three components with two options for each, the total number of possible recipes is 2^3 = 8 recipes. However, by using MVT, we can run just four recipes and through analysis identify which of the combinations is the best (whether it was one of the original four we tested, or one that we didn't test), and we do this by looking at the effect each component has. The effect of each component is often called the element contribution.

In order to run the multi-variate test with four recipes (instead of an A/B/n test with all eight recipes) we need to carefully select the recipes we run - we can't just pick four at random. We need to make sure that the four recipes cover each variation of each element. for example, the set of four shown above (A-D) does not have a version with a red 'On Sale!' element, so we can't compare red against black. It is possible to run a multi-variate test to cover 2^3 combinations with just four recipe, but we'll need to be slightly more selective. Using mathematical langugage, the set of recipes that we need to use have to be orthogonal (i.e. they "point" in different directions - in geometry, 90 degrees difference - so have almost nothing in common). In IT circles, it would be called orthogonal array testing (warning: the Wikipedia entry is full of technical vocabulary).

Many tools will identify the set of recipes to test - Adobe's Test and Target does this, for example; alternatively, I'm sure that your account manager with your tool provider will be able to work with you to identify the set you need.

Here, then are the full set of eight recipes that I could have for my MVT, and the four recipes that I would need to run on my site:

The full set of eight recipes

Recipe	Gender	Colour	Wording
S	Man	Red	Sale
T	Man	Red	Buy
U	Man	Black	Sale
V	Man	Black	Buy
W	Woman	Red	Sale
X	Woman	Red	Buy
Y	Woman	Black	Sale
Z	Woman	Black	Buy

The recipes highlighted in bold represent one possible set of four recipes that would form a successful MVT set. There are others (for example, those not highlighted in bold are a complete set too).

An example set of four recipes that could be tested successfully

Recipe	Gender	Colour	Wording
A	Man	Red	Sale
B	Man	Black	Buy
C	Woman	Red	Buy
D	Woman	Black	Sale

Notice that in the full set of eight recipes, each variation (man or woman, red or black, sale or buy) appears four times each. In the subset of four recipes to be tested, each variation appears twice, and this confirms that the subset is suitable for testing.

The visuals for the four approved test recipes are:

Recipe A	RecipeB
Recipe C	Recipe D

And we can see by inspection that the four recipes do indeed have two with the man, two with the woman; two with red text and two with black; two with "Buy Now!" and two with "On Sale!"

The next step is to run the test as if it were an A/B/C/D test - with one difference: it's quite possible that one or more of the four test recipes may do very badly (or very well) compared to the others. However, it's highly recommended (but not essential) that you run all four recipes for the same length of time, and allow them to obtain equal numbers of traffic. In an MVT test run, it's important to have a large enough population of visitors for each recipe - it's not just about running until one of the four is signficantly better (or worse) than the others and calling a winner.

Analysis

Let's assume that we've run the test, and obtained the following data:

Recipe	A	B	C	D
Gender	Man	Woman	Woman	Man
Wording	Buy Now	Buy Now	On Sale	On Sale
Colour	Black	Red	Black	Red
Impressions	1010	1014	1072	1051
Clicks	341	380	421	291
Click-through rate	34%	37%	41%	28%

It looks from these results as if the winner is Recipe C; the picture of the woman, with black text saying, "On Sale!". However, there are four other recipes that we didn't test, but we can infer their relative performance by doing some judicious arithmetic with the data we have.

To begin with, we can identify which colour is better, black or red, by comparing the two recipes which have black text against the two recipes which have red text.

This might seem dangerous or confusing, but let's think about it. The two recipes which have black text are A and C. For recipe A, we have a man with "Buy Now!" and for recipe C, we have a woman with "On Sale!". The net result of combining recipe A and C is to isolate everybody who saw black text, with the other elements being reduced to noise (no net contribution from either element). This works logically when we compare A and C with the combination of B and D. B and D both have red text, but half have a man and half have a woman; half have "On Sale!" and half have "Buy Now!". The consequence of this is that we can isolate the effect of black text against red text - the other factors are reduced to noise.

We could think of this mathematically, using simple expressions:

A+C = (Man + Buy Now + Black) + (Woman + On Sale + Black)
A+C = Man + Woman + Buy Now + On Sale + 2xBlack

B+D =(Woman + Buy Now + Red) + (Man + On Sale + Red)
B+D = Man + Woman + Buy Now + On Sale + 2xRed

Subtracting one from the other, and cancelling like terms...
A+C - B+D = 2xBlack - 2xRed

When we compare A+C and B+D, we get this:

Recipe	A+C (black)	B+D (red)
Total impressions	2082	2065
Total clicks	762	671
CTR	36.6%	32.5%

So we can see that A+B (black) is better than C+D (red) - and we can attribute an element contribution of +12.63% to the colour black.

We can also do the maths to obtain the best gender and wording:

Gender: A+D = man, B+C = woman

Recipe	A+D	B+C
Total impressions	2061	2086
Total clicks	632	801
CTR	30.7%	38.4%

Result: woman is 25.2% better than man (on CTR in this test ;-) )

Wording: A+B = Buy Now, C+D = On Sale

Recipe	A+B	C+D
Total impressions	2024	2123
Total clicks	721	712
CTR	35.6%	33.5%

Result: Buy Now is 6.22% better than On Sale

Summarising our results:

Result: black is 12.63% better than red
Result: woman is 25.2% better than man

Result: Buy Now is 6.22% better than On Sale

The winner!

The winning combination is black, buy now with woman, which is one that we didn't actually include in our test recipes. The recommended follow-up is to test the winning recipe from the four that we did test against the proposed winner from the analysis we've just done. Where that isn't possible, for whatever reason, you could test your existing control design against the proposed winner. Alternatively, you could just go implement the theoretical winner without testing - it's up to you.

A brief note on the analysis: this shows the importance of keeping all test recipes running for an equal length of time, so that they receive approximatley equal volumes of traffic. Here, recipes A, B, C and D all received around 1000 impressions, but if one of them had significantly fewer (because it was switched off early because it "wasn't performing well") then that recipe would not have an equal weighting in the calculations where we compared the pairs of recipes, and its perceived performance would be higher than its actual.

I hope I've been able to show in this article (and the previous one) how it's possible to set up and analyse a multi-variate test, starting with the principles of identifying the variables you want to test, then establishing which recipes are required, and then showing how to analyse the results you obtain.

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!
---

Image credits:
man - http://www.findresumetemplates.com/job-interview
woman - http://www.sheknows.com/living

Web Optimisation, Maths and Puzzles

Header tag

Wednesday, 10 September 2014

How to set up and analyse a multi-variate test

Here's my series on Multi Variate Testing