uyhjjddddddddddd Web Optimisation, Maths and Puzzles: multi variate

Header tag

Showing posts with label multi variate. Show all posts
Showing posts with label multi variate. Show all posts

Tuesday, 21 May 2019

Three-Factor Multi-Variate Testing

TESTING ALL POSSIBILITIES WITHOUT TESTING EVERYTHING

My favourite part of my job is determining what to test, and planning how to run a test.  I enjoy the analysis afterwards, but the most enjoyable part of the testing process is deciding what the test recipes will actually be.  I've covered - at length - test design and planning, and also multi-variate testing.  I particularly enjoy multi-variate testing, since it simply allows you test all possibilities without having to test everything.


In my previous posts, where I introduced MVT, I've only covered two-factor MVT: should this variable be black or red?  Should it a picture of a man or a woman?  Should it say 'Special offer' or 'Limited time'?  Is it x or is it y?  How do you analyse MVT results? In this post, I'm going to take the discussion of testing one step further, and look at three-factor multi-variate testing:  should it be x, y or z?


Just as there are limited opportunities for MVT, the range of opportunities for three-factor MVT is potentially even more limited.  However, I'd like to explain that this doesn't have to be the case, and that it just takes careful planning to determine when and how to set up a test where there are three possible 'best answers'.


SCENARIO


You run a domestic travel agency, which specialises in arranging domestic travel for customers across the country (this works better if you imagine it in the US, but it works for smaller countries too).  You provide a full door-to-door service, handling everything from fuel, insurance, tickets, transfers  - whatever it takes, you can do it.  Consequently, you are in high demand around Christmas and Thanksgiving (see, I told you this worked better in the US), and potentially other holiday periods.  Yes, you're a travel agency firm based on Planes, Tranes and Automobiles.


It's the run-up to the largest sales time of the year, as you prepare to reunite distant family members across the country for those big family celebrations and parties and whatever else.  What do you lead with on your website's homepage?


Planes?

Trains?
Or automobiles?

If you want to include buses, look out for a not-yet-planned post on four-factor MVT.  I'll have it ready by Christmas.


So far, this would be a straightforward A/B/C test, with a plane, a car and a train.  Your company colours are yellow, so let's go with that:



Your marketing team are also unsure how to lead with their messaging - should they emphasise price, reliability, or an emotional connection?


They can't choose between

"Cross the country without costing the world" (price)
"Guaranteed door-to-door on time, every time" (reliability)
"Bring your smile to their doorstep this holiday" (emotional)

So now we have nine recipes, A-I.


A: Plane plus Price
B:  Plane plus reliability
C: Plane plus emotions

D: Car plus Price
E:  Car plus reliability
F: Car plus emotions


G: Train plus price
H: Train plus reliability
I: Train plus emotions


Now, somebody in the exec suite has decided that now might be the time to try out a new set of corporate colours.  Yellow is bright and cheery, but according to the exec, it can be seen as immature, and not very sophisticated.  The alternatives are red and blue (plus the original yellow).


Here goes:  there are now 3x3x3 possible variations - that's 27 altogether.  And you can't run a test with 27 recipes - for a start, there aren't enough letters in the alphabet.  There's also traffic and timing to consider - it will take months to run a test like that to get any level of significance.  Nevertheless, this is an executive request, so we'll have to make it happen.


Firstly, the visuals:  if this was just a two-variable test, then we'd have nine recipes, as you can see below.



















However, each of these vehicle/colour combinations has three more options (based on the marketing message that we select) - here is a small sample of the 27 total combinations, to give you an idea.










          
   
This is not a suitable testing set, but it gives you an idea of the total variations that we're looking at.  The next step, as we did with the more straightforward two-factor MVT, is to identify our orthogonal set - the minimum recipes that we could test that would give us sufficient information to infer the performance of the recipes that we don't test.  It's time to charge up your spreadsheet.

THE RECIPES - AN ORTHOGONAL SET

There are 3*3*3 = 27 different combinations of colour, text and vehicle... here's the list, since you're wondering ;-)



Recipe Colour Vehicle Message
A Red Plane Price
B Red Plane Reliability
C Red Plane Emotions
D Red Train Price
E Red Train Reliability
F Red Train Emotions
G Red Car Price
H Red Car Reliability
I Red Car Emotions
J Blue Plane Price
K Blue Plane Reliability
L Blue Plane Emotions
M Blue Train Price
N Blue Train Reliability
O Blue Train Emotions
P Blue Car Price
Q Blue Car Reliability
R Blue Car Emotions
S Yellow Plane Price
T Yellow Plane Reliability
U Yellow Plane Emotions
V Yellow Train Price
W Yellow Train Reliability
X Yellow Train Emotions
Y Yellow Car Price
Z Yellow Car Reliability
AA Yellow Car Emotions


The recipes with the faint green shading would form a simple orthogonal set; here they are for clarity:

Recipe Colour Vehicle Message
A Red Plane Price
E Red Train Reliability
I Red Car Emotions
K Blue Plane
Reliability
O Blue Train Emotions
P Blue Car Price
U Yellow Plane
Emotions
V Yellow Train Price
Z Yellow Car Reliability


Note that each colour, vehicle and message appear three times each; there are therefore nine recipes that we need.  This is still a considerable number, but it's a significant saving from 27 in total.

THE ANALYSIS

Which colour?  How to find the best variation for each element


Select the recipes which will give us a reading on the best colour by choosing recipes where the other variants cancel to noise:


This is simple (and simpler than the two-factor version):  we simply add the results for all the "red" recipes, and compare with the sum of all the "blue" recipes and, compare with the data for all the "yellow" recipes.


Let's take a look at some hypothetical data, based on the orthogonal recipe set shown above:


Recipe

a

e

i

k

o

p

u

v

z

Visits

1919

1922

1932

1939

1931

1934

1915

1955

1944

Bookings

193

194

189

194

205

192

200

209

206

Revenue (k)

£14.2

£14.6

£14.4

£14.3

£15.6

£13.94

£14.8

£15.7

£15.4

Conversion

10.1%

10.1%

9.8%

10.0%

10.6%

9.9%

10.4%

10.7%

10.6%

Lift

-

0.4%

-2.7%

-0.5%

5.6%

-1.3%

3.8%

6.3%

5.4%

Avg Booking Value

 £73.58

 £75.26

 £76.19

 £73.71

 £76.10

 £72.60

 £74.00

 £75.12

 £74.76

Lift - 2.3% 3.6% 0.2% 3.4% -1.3% 0.6% 2.1% 1.6%
RPV  £7.40  £7.60  £7.45  £7.37  £8.08  £7.21  £7.73  £8.03  £7.92
Lift - 2.7% 0.7% -0.3% 9.2% -2.6% 4.4% 8.5% 7.1%


I've shown the raw metrics and the calculated metrics for the recipes, but it's important to remember at this point:  the recipes shown here probably won't include the best recipe.  After all, we're testing nine recipes out of a total of 27, so we have only a one in three chance of selecting the optimum combination.
What we need to do next, as I mentioned above, is to combine the data for all the yellow recipes, and compare with the red and the blue.



Recipes
aei kop uvz
Colour
Red Blue Lift vs Red Yellow Lift vs Red
Visits
5773
5804
5814
Bookings
576
591
615
Revenue (k)
£43.2
£43.84
£45.9
Conversion
9.98%
10.18%
2.1%
10.58%
6.0%
ABV
75.00
74.18
-1.1%
74.63
-0.5%
RPV
7.48
7.55
0.9%
7.89
5.5%


So we can see from our simple colour analysis (adding all the results for the recipes which contain Red, vs Blue, vs Yellow) that Yellow is the best.  The Conversion has a 6% lift, and while Average Booking Value is slightly lower, the Revenue Per Visit is still 5.5% higher for the yellow recipes than it is for the Red.

Now we do the same for the vehicles: plane, train or car?
Recipes aku eov ipz
Vehicle Plane Train Lift
vs Plane
Car Lift
vs Plane
Visits 5773 5808 5810
Bookings 587 608 587
Revenue (k) £43.3 £45.9 £43.74
Conversion 10.17% 10.47% 3.0% 10.10% -0.6%
ABV 73.76 75.49 2.3% 74.51 1.0%
RPV 7.50 7.90 5.4% 7.53 0.4%

Clear winner in this case:  it's Train, which is the best for conversion, average booking value and revenue per visit.

And finally, the messaging:  emotional, price or reliability?

Recipes apv iou ekz
Message Price Emotion Lift vs Price Reliability Lift vs Price
Visits 5808 5778 5805
Bookings 594 594 594
Revenue 43.84 44.8 44.3
Conversion 10.23% 10.28% 0.5% 10.23% 0.1%
ABV 73.80 75.42 2.2% 74.58 1.0%
RPV 7.55 7.75 2.7% 7.63 1.1%

And in this case, it's Emotion which is the best, with clearly better average booking value and revenue.  It would appear that price is not the best way to lead your messaging.

CONCLUSION AND THOUGHTS

The best combination is:
Yellow Train, with Emotion messaging.


Notice that the performance of the recipes that we actually tested is in agreement with the winning combination (based on the calculations)


Recipes that contain none of the winning elements performed the worst:

A  - Red Plane, Price :  RPV £7.40
K - Blue Plane, Reliability:  RPV £7.37
P - Blue Car, Price  :  RPV £7.21

Recipes that contain just one of the winning elements produced slightly  better results:

E - Red Train, Reliability:  £7.60
I - Red Car, Emotions:  £7.45

Z - Yellow Car, Reliability: £7.92*

Recipes that contained two of the three winning elements were the best performers:

O - Blue Train, Emotions:  £8.08
U - Yellow Plane, Emotions:  £7.73
V - Yellow Train, Price: £8.03


I would strongly recommend running a follow-up test, with the two winners from the first selection (O and V) along with the proposed winner based on the analysis, Yellow Train with Emotions.  It's possible that this proposed winner will be the best; there's also the possibility that it may be close to but not as good as O or V. 

*There's also an argument for including Z (Yellow Car, Reliability) as an outlier, given its performance.  


There are some clear losers that do not need to be pursued:  notice how two of the bottom three performing recipes contain Blue and Price.  All of the Price recipes that we tested - A, P and V, had lower than typical Average Booking Value, and this includes recipe V, which was one of the best recipes.  With a different message (Emotions, most likely), Recipe V would be a runaway success.

It's not surprising that a follow-up is needed; remember that we've only tested nine out of 27 combinations, and it's unlikely that we'll have hit the optimum design first time around.  However, by careful selection of our original recipes, we need only test four more (at the most) to identify the best from all 27.  Finding the best combination from 27, by only testing 13 is a definite winner.  This is the power of multi-variate testing: the ability to test all possibilities without having to test everything.

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing 
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good 
Hands on:  How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Wednesday, 10 September 2014

How to set up and analyse a multi-variate test

I've written at length about multi-variate tests.  I've discussed barriers, complexity and design, and each time, I've concluded by saying that I would write an article about how to analyse the results from a multi variate test.  This is that article.

I'm going to use the example I set up last time:  testing the components of a banner to optimise its effectiveness.  The success metric has been decided and it's click-through rate (for the sake of argument).

There are three components that are going to be tested:
- should the picture in the banner be a man or a woman?
- should the text in the banner say "On Sale!" or "Buy now!"
- should the text be black or red?

Here are a few example recipes from my previous post on MVT.


Recipe 1
Recipe 2
Recipe 3
Recipe 4

Recipe selection and test plan

When there are three components with two options for each, the total number of possible recipes is 2^3 = 8 recipes.  However, by using MVT, we can run just four recipes and through analysis identify which of the combinations is the best (whether it was one of the original four we tested, or one that we didn't test), and we do this by looking at the effect each component has.  The effect of each component is often called the element contribution.


In order to run the multi-variate test with four recipes (instead of an A/B/n test with all eight recipes) we need to carefully select the recipes we run - we can't just pick four at random.  We need to make sure that the four recipes cover each variation of each element.  for example, the set of four shown above (A-D) does not have a version with a red 'On Sale!' element, so we can't compare red against black.  It is possible to run a multi-variate test to cover 2^3 combinations with just four recipe, but we'll need to be slightly more selective.  Using mathematical langugage, the set of recipes that we need to use have to be orthogonal (i.e. they "point" in different directions - in geometry, 90 degrees difference - so have almost nothing in common). In IT circles, it would be called orthogonal array testing (warning: the Wikipedia entry is full of technical vocabulary).

Many tools will identify the set of recipes to test - Adobe's Test and Target does this, for example; alternatively, I'm sure that your account manager with your tool provider will be able to work with you to identify the set you need.


Here, then are the full set of eight recipes that I could have for my MVT, and the four recipes that I would need to run on my site:

The full set of eight recipes
Recipe Gender Colour Wording
S Man Red Sale
T Man Red Buy
U Man Black Sale
V Man Black Buy
W Woman Red Sale
X Woman Red Buy
Y Woman Black Sale
Z Woman Black Buy

The recipes highlighted in bold represent one possible set of four recipes that would form a successful MVT set.  There are others (for example, those not highlighted in bold are a complete set too).

An example set of four recipes that could be tested
successfully

Recipe Gender Colour Wording
A Man Red Sale
B Man Black Buy
C Woman Red Buy
D Woman Black Sale

Notice that in the full set of eight recipes, each variation (man or woman, red or black, sale or buy) appears four times each.  In the subset of four recipes to be tested, each variation appears twice, and this confirms that the subset is suitable for testing.

The visuals for the four approved test recipes are:


Recipe A
RecipeB
Recipe C
Recipe D

And we can see by inspection that the four recipes do indeed have two with the man, two with the woman; two with red text and two with black; two with "Buy Now!" and two with "On Sale!"


The next step is to run the test as if it were an A/B/C/D test - with one difference:  it's quite possible that one or more of the four test recipes may do very badly (or very well) compared to the others.  However, it's highly recommended (but not essential) that you run all four recipes for the same length of time, and allow them to obtain equal numbers of traffic.  In an MVT test run, it's important to have a large enough population of visitors for each recipe - it's not just about running until one of the four is signficantly better (or worse) than the others and calling a winner.

Analysis

Let's assume that we've run the test, and obtained the following data:

Recipe A B C D
Gender Man Woman Woman Man
Wording Buy Now Buy Now On Sale On Sale
Colour Black Red Black Red
Impressions 1010 1014 1072 1051
Clicks 341 380 421 291
Click-through rate 34% 37% 41% 28%

It looks from these results as if the winner is Recipe C; the picture of the woman, with black text saying, "On Sale!".  However, there are four other recipes that we didn't test, but we can infer their relative performance by doing some judicious arithmetic with the data we have.

To begin with, we can identify which colour is better, black or red, by comparing the two recipes which have black text against the two recipes which have red text.


This might seem dangerous or confusing, but let's think about it.  The two recipes which have black text are A and C.  For recipe A, we have a man with "Buy Now!" and for recipe C, we have a woman with "On Sale!".  The net result of combining recipe A and C is to isolate everybody who saw black text, with the other elements being reduced to noise (no net contribution from either element).  This  works logically when we compare A and C with the combination of B and D.  B and D both have red text, but half have a man and half have a woman; half have "On Sale!" and half have "Buy Now!".  The consequence of this is that we can isolate the effect of black text against red text - the other factors are reduced to noise.


We could think of this mathematically, using simple expressions:

A+C = (Man + Buy Now + Black) + (Woman + On Sale + Black)
A+C = Man + Woman + Buy Now + On Sale + 2xBlack


B+D =(Woman + Buy Now + Red) + (Man + On Sale + Red)
B+D = Man + Woman + Buy Now + On Sale + 2xRed

Subtracting one from the other, and cancelling like terms...
A+C - B+D = 2xBlack - 2xRed

When we compare A+C and B+D, we get this:

Recipe A+C (black) B+D (red)
Total impressions 2082 2065
Total clicks 762 671
CTR 36.6% 32.5%

So we can see that A+B (black) is better than C+D (red) - and we can attribute an element contribution of +12.63% to the colour black.

We can also do the maths to obtain the best gender and wording:

Gender:  A+D = man, B+C = woman
Recipe A+D B+C
Total impressions 2061 2086
Total clicks 632 801
CTR 30.7% 38.4%
Result:  woman is 25.2% better than man (on CTR in this test ;-) )

Wording: A+B = Buy Now, C+D = On Sale
Recipe A+B C+D
Total impressions 2024 2123
Total clicks 721 712
CTR 35.6% 33.5%
Result:  Buy Now is 6.22% better than On Sale


Summarising our results:

Result:  black is 12.63% better than red
Result:  woman is 25.2% better than man

Result:  Buy Now is 6.22% better than On Sale

The winner!
The winning combination is black, buy now with woman, which is one that we didn't actually include in our test recipes.  The recommended follow-up is to test the winning recipe from the four that we did test against the proposed winner from the analysis we've just done.  Where that isn't possible, for whatever reason, you could test your existing control design against the proposed winner.  Alternatively, you could just go implement the theoretical winner without testing - it's up to you.

A brief note on the analysis:  this shows the importance of keeping all test recipes running for an equal length of time, so that they receive approximatley equal volumes of traffic.  Here, recipes A, B, C and D all received around 1000 impressions, but if one of them had significantly fewer (because it was switched off early because it "wasn't performing well") then that recipe would not have an equal weighting in the calculations where we compared the pairs of recipes, and its perceived performance would be higher than its actual.



I hope I've been able to show in this article (and the previous one) how it's possible to set up and analyse a multi-variate test, starting with the principles of identifying the variables you want to test, then establishing which recipes are required, and then showing how to analyse the results you obtain.

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing 
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good 
Hands on:  How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!
---

Image credits: 
man  - http://www.findresumetemplates.com/job-interview
woman - http://www.sheknows.com/living