My favourite part of my job is determining what to test, and planning how to run a test. I enjoy the analysis afterwards, but the most enjoyable part of the testing process is deciding what the test recipes will actually be. I've covered - at length - test design and planning, and also multi-variate testing. I particularly enjoy multi-variate testing, since it simply allows you test all possibilities without having to test everything.
In my previous posts, where I introduced MVT, I've only covered two-factor MVT: should this variable be black or red? Should it a picture of a man or a woman? Should it say 'Special offer' or 'Limited time'? Is it x or is it y? How do you analyse MVT results? In this post, I'm going to take the discussion of testing one step further, and look at three-factor multi-variate testing: should it be x, y or z?
Just as there are limited opportunities for MVT, the range of opportunities for three-factor MVT is potentially even more limited. However, I'd like to explain that this doesn't have to be the case, and that it just takes careful planning to determine when and how to set up a test where there are three possible 'best answers'.
SCENARIO
You run a domestic travel agency, which specialises in arranging domestic travel for customers across the country (this works better if you imagine it in the US, but it works for smaller countries too). You provide a full door-to-door service, handling everything from fuel, insurance, tickets, transfers - whatever it takes, you can do it. Consequently, you are in high demand around Christmas and Thanksgiving (see, I told you this worked better in the US), and potentially other holiday periods. Yes, you're a travel agency firm based on Planes, Tranes and Automobiles.
It's the run-up to the largest sales time of the year, as you prepare to reunite distant family members across the country for those big family celebrations and parties and whatever else. What do you lead with on your website's homepage?
Planes?
Trains?
Or automobiles?
If you want to include buses, look out for a not-yet-planned post on four-factor MVT. I'll have it ready by Christmas.
So far, this would be a straightforward A/B/C test, with a plane, a car and a train. Your company colours are yellow, so let's go with that:
Your marketing team are also unsure how to lead with their messaging - should they emphasise price, reliability, or an emotional connection?
They can't choose between
"Cross the country without costing the world" (price)
"Guaranteed door-to-door on time, every time" (reliability)
"Bring your smile to their doorstep this holiday" (emotional)
So now we have nine recipes, A-I.
Now, somebody in the exec suite has decided that now might be the time to try out a new set of corporate colours. Yellow is bright and cheery, but according to the exec, it can be seen as immature, and not very sophisticated. The alternatives are red and blue (plus the original yellow).
Here goes: there are now 3x3x3 possible variations - that's 27 altogether. And you can't run a test with 27 recipes - for a start, there aren't enough letters in the alphabet. There's also traffic and timing to consider - it will take months to run a test like that to get any level of significance. Nevertheless, this is an executive request, so we'll have to make it happen.
Firstly, the visuals: if this was just a two-variable test, then we'd have nine recipes, as you can see below.
|
---|
This is not a suitable testing set, but it gives you an idea of the total variations that we're looking at. The next step, as we did with the more straightforward two-factor MVT, is to identify our orthogonal set - the minimum recipes that we could test that would give us sufficient information to infer the performance of the recipes that we don't test. It's time to charge up your spreadsheet.
THE RECIPES - AN ORTHOGONAL SET
There are 3*3*3 = 27 different combinations of colour, text and vehicle... here's the list, since you're wondering ;-)
Recipe | Colour | Vehicle | Message |
A | Red | Plane | Price |
B | Red | Plane | Reliability |
C | Red | Plane | Emotions |
D | Red | Train | Price |
E | Red | Train | Reliability |
F | Red | Train | Emotions |
G | Red | Car | Price |
H | Red | Car | Reliability |
I | Red | Car | Emotions |
J | Blue | Plane | Price |
K | Blue | Plane | Reliability |
L | Blue | Plane | Emotions |
M | Blue | Train | Price |
N | Blue | Train | Reliability |
O | Blue | Train | Emotions |
P | Blue | Car | Price |
Q | Blue | Car | Reliability |
R | Blue | Car | Emotions |
S | Yellow | Plane | Price |
T | Yellow | Plane | Reliability |
U | Yellow | Plane | Emotions |
V | Yellow | Train | Price |
W | Yellow | Train | Reliability |
X | Yellow | Train | Emotions |
Y | Yellow | Car | Price |
Z | Yellow | Car | Reliability |
AA | Yellow | Car | Emotions |
Recipe | Colour | Vehicle | Message |
A | Red | Plane | Price |
E | Red | Train | Reliability |
I | Red | Car | Emotions |
K | Blue | Plane | Reliability |
O | Blue | Train | Emotions |
P | Blue | Car | Price |
U | Yellow | Plane | Emotions |
V | Yellow | Train | Price |
Z | Yellow | Car | Reliability |
Note that each colour, vehicle and message appear three times each; there are therefore nine recipes that we need. This is still a considerable number, but it's a significant saving from 27 in total.
THE ANALYSIS
Which colour? How to find the best variation for each element
THE ANALYSIS
Which colour? How to find the best variation for each element
Select the recipes which will give us a reading on the best colour by choosing recipes where the other variants cancel to noise:
This is simple (and simpler than the two-factor version): we simply add the results for all the "red" recipes, and compare with the sum of all the "blue" recipes and, compare with the data for all the "yellow" recipes.
Let's take a look at some hypothetical data, based on the orthogonal recipe set shown above:
Recipe |
a |
e |
i |
k |
o |
p |
u |
v |
z |
Visits |
1919 |
1922 |
1932 |
1939 |
1931 |
1934 |
1915 |
1955 |
1944 |
Bookings |
193 |
194 |
189 |
194 |
205 |
192 |
200 |
209 |
206 |
Revenue (k) |
£14.2 |
£14.6 |
£14.4 |
£14.3 |
£15.6 |
£13.94 |
£14.8 |
£15.7 |
£15.4 |
Conversion |
10.1% |
10.1% |
9.8% |
10.0% |
10.6% |
9.9% |
10.4% |
10.7% |
10.6% |
Lift |
- |
0.4% |
-2.7% |
-0.5% |
5.6% |
-1.3% |
3.8% |
6.3% |
5.4% |
Avg Booking Value |
£73.58 |
£75.26 |
£76.19 |
£73.71 |
£76.10 |
£72.60 |
£74.00 |
£75.12 |
£74.76 |
Lift | - | 2.3% | 3.6% | 0.2% | 3.4% | -1.3% | 0.6% | 2.1% | 1.6% |
RPV | £7.40 | £7.60 | £7.45 | £7.37 | £8.08 | £7.21 | £7.73 | £8.03 | £7.92 |
Lift | - | 2.7% | 0.7% | -0.3% | 9.2% | -2.6% | 4.4% | 8.5% | 7.1% |
I've shown the raw metrics and the calculated metrics for the recipes, but it's important to remember at this point: the recipes shown here probably won't include the best recipe. After all, we're testing nine recipes out of a total of 27, so we have only a one in three chance of selecting the optimum combination.
What we need to do next, as I mentioned above, is to combine the data for all the yellow recipes, and compare with the red and the blue.
What we need to do next, as I mentioned above, is to combine the data for all the yellow recipes, and compare with the red and the blue.
Recipes
|
aei | kop | uvz | ||
Colour
|
Red | Blue | Lift vs Red | Yellow | Lift vs Red |
Visits
|
5773
|
5804
|
5814
|
||
Bookings
|
576
|
591
|
615
|
||
Revenue (k)
|
£43.2
|
£43.84
|
£45.9
|
||
Conversion |
9.98%
|
10.18%
|
2.1%
|
10.58%
|
6.0%
|
ABV
|
75.00
|
74.18
|
-1.1%
|
74.63
|
-0.5%
|
RPV
|
7.48
|
7.55 |
0.9%
|
7.89 |
5.5%
|
So we can see from our simple colour analysis (adding all the results for the recipes which contain Red, vs Blue, vs Yellow) that Yellow is the best. The Conversion has a 6% lift, and while Average Booking Value is slightly lower, the Revenue Per Visit is still 5.5% higher for the yellow recipes than it is for the Red.
Now we do the same for the vehicles: plane, train or car?
Recipes | aku | eov | ipz | ||
Vehicle | Plane | Train | Lift vs Plane |
Car | Lift vs Plane |
Visits | 5773 | 5808 | 5810 | ||
Bookings | 587 | 608 | 587 | ||
Revenue (k) | £43.3 | £45.9 | £43.74 | ||
Conversion | 10.17% | 10.47% | 3.0% | 10.10% | -0.6% |
ABV | 73.76 | 75.49 | 2.3% | 74.51 | 1.0% |
RPV | 7.50 | 7.90 | 5.4% | 7.53 | 0.4% |
Clear winner in this case: it's Train, which is the best for conversion, average booking value and revenue per visit.
And finally, the messaging: emotional, price or reliability?
Recipes | apv | iou | ekz | ||
Message | Price | Emotion | Lift vs Price | Reliability | Lift vs Price |
Visits | 5808 | 5778 | 5805 | ||
Bookings | 594 | 594 | 594 | ||
Revenue | 43.84 | 44.8 | 44.3 | ||
Conversion | 10.23% | 10.28% | 0.5% | 10.23% | 0.1% |
ABV | 73.80 | 75.42 | 2.2% | 74.58 | 1.0% |
RPV | 7.55 | 7.75 | 2.7% | 7.63 | 1.1% |
And in this case, it's Emotion which is the best, with clearly better average booking value and revenue. It would appear that price is not the best way to lead your messaging.
CONCLUSION AND THOUGHTS
The best combination is:
Yellow Train, with Emotion messaging.
Notice that the performance of the recipes that we actually tested is in agreement with the winning combination (based on the calculations)
Recipes that contain none of the winning elements performed the worst:
A - Red Plane, Price : RPV £7.40
K - Blue Plane, Reliability: RPV £7.37
P - Blue Car, Price : RPV £7.21
Recipes that contain just one of the winning elements produced slightly better results:
E - Red Train, Reliability: £7.60
I - Red Car, Emotions: £7.45
Z - Yellow Car, Reliability: £7.92*
Recipes that contained two of the three winning elements were the best performers:
O - Blue Train, Emotions: £8.08
U - Yellow Plane, Emotions: £7.73
V - Yellow Train, Price: £8.03
I would strongly recommend running a follow-up test, with the two winners from the first selection (O and V) along with the proposed winner based on the analysis, Yellow Train with Emotions. It's possible that this proposed winner will be the best; there's also the possibility that it may be close to but not as good as O or V.
*There's also an argument for including Z (Yellow Car, Reliability) as an outlier, given its performance.
There are some clear losers that do not need to be pursued: notice how two of the bottom three performing recipes contain Blue and Price. All of the Price recipes that we tested - A, P and V, had lower than typical Average Booking Value, and this includes recipe V, which was one of the best recipes. With a different message (Emotions, most likely), Recipe V would be a runaway success.
It's not surprising that a follow-up is needed; remember that we've only tested nine out of 27 combinations, and it's unlikely that we'll have hit the optimum design first time around. However, by careful selection of our original recipes, we need only test four more (at the most) to identify the best from all 27. Finding the best combination from 27, by only testing 13 is a definite winner. This is the power of multi-variate testing: the ability to test all possibilities without having to test everything.