Header tag

Showing posts with label optimisation. Show all posts
Showing posts with label optimisation. Show all posts

Friday, 24 April 2026

"Click For Free Money" on the Customer Journey

A few years ago, a colleague of mine highlighted a crucial insight: engagement on a button isn't always a reliable measure of success.  I know it's hardly an earth-shattering revelation, but it came at a time when KPIs were focusing on engagement, when he and I were looking at testing CTA wording.  Should we go for "Buy Now" or "Customise and Buy"?  Which is better - "View Details" or "Learn More"?  In short, we discovered that the best wording depended on the point in the purchase path, and we had to separate the analysis, because the button that achieved the highest click-through rate (CTR) wasn't always the one that led to the best overall conversion rate. 

In simpler terms, success isn't just about shuffling customers to the next page in our online purchase path. While persuasive wording can certainly encourage users to click and advance, we need to critically ask ourselves: does this genuinely help them progress beyond that initial stage, or are we just creating a false sense of momentum? This brings us to what I call the "Click for Free Money" fallacy. You can absolutely get people to click a button, but the subsequent page must deliver on the expectation set by that button's wording.  Click for Free Money will certainly get a lot of clicks, but there's no benefit to this approach, as people realise that the next page is actually just more information about our products, and maybe a coupon code to validate the 'free money' offer.  


The Promise and the Reality of the Click

Consider the user's mental model. When a button says "Add to Cart," the user fully expects to be taken to their cart fairly quickly. A brief, relevant detour to offer a highly personalized upsell might be acceptable, but anything more, or anything irrelevant, will create friction. Similarly, a "Customize" button should seamlessly lead to customization options, allowing the user to tailor their product without unnecessary distractions. And perhaps most critically, a "Checkout" button must initiate the checkout process. This is not the time for additional upsells, cross-sells, or any other interruptions that could derail a customer who is ready to complete their purchase. Each click builds an expectation, and failing to meet that expectation, even subtly, erodes trust and increases the likelihood of abandonment.  A lack of 'free money' will erode trust remarkably quickly, although it will probably be on the 'free money page, not the previous page.  Still, if you and your team are prepared to do anything to get users to move forwards from your page, you could deploy this tactic.  It depends on what your KPI actually is (and who's looking at the long-term journey).

The Detrimental Cost of Rushing Customers

Pushing customers forward too quickly isn't just inefficient; it can be detrimental to long-term customer relationships and conversion rates. If you don't continually reinforce the value proposition and reassure the user along their journey, they're highly likely to drop out at a later, more critical stage. For instance, if you enticed a user with a 10% discount while they were casually Browse your site, that discount needs to be prominently displayed and automatically applied on the cart page. Hiding it, or making them jump through hoops to redeem it, is a surefire way to lose a sale and disappoint a customer.

It's vital to shift our focus: moving customers towards a purchase isn't as important as guiding them through their journey to choose the best product for them. This is a fundamental difference between a transactional mindset and a customer-centric approach.

Think of it like a pushy sales assistant in a physical store. Imagine walking in, and immediately, an assistant shoves an item into your hands, puts an arm around your shoulder, and starts steering you towards the checkout. When you haltingly ask, "But is this my size?" they might dismissively respond, "Probably, yes." If you inquire, "Will it perform better than my previous widget?" they might curtly say, "Who cares? Cash or card?" Or if you try one more time, "But is it quieter and more efficient than the old model?" they might simply reply, "Yes. Is the shipping address the same as the billing address?"

While this sales assistant might technically "drag" the customer to the start of the checkout process, what was the true cost? A frustrated, confused, and potentially alienated customer who probably won't complete the purchase and certainly won't return (worse still, they'll tell their friends).

Preparing for the Next Step: The True Purpose of the Funnel

Often, all we accomplish by being overly aggressive or deceptive in the early stages of the funnel is simply shifting the exit point to a later stage in the user's journey. The customer still leaves, but now they're more annoyed because they've invested more time and effort. Instead, it's far more important and beneficial to leverage each step of the customer journey to authentically prepare users for what they'll encounter next. Each stage should provide value to the user, clarify information, and build confidence and trust.  In a situation where different parts of the journey belong to different developers, teams and managers, 

When a user is genuinely informed, reassured, and ready to take the next step—because they understand the value and feel confident in their choice—then and only then, should we make it effortless for them to move forward. This approach fosters trust, reduces abandonment rates, and ultimately leads to more satisfied customers and higher conversion rates in the long run.  We may not offer 'free money', but by subtly pushing users forwards in a path that they aren't fully prepared to take - by providing quicker paths or by removing content that is actually useful to users - we will merely persuade them to move forwards from our page into a situation where they're increasingly likely to leave from the next page.

But still, our next-step page flow metrics show a lovely funnel that shows we moved 10% more users to the next step.  Our analysis, then needs to show how many users move forwards to the next-next step, and then actually completed the full purchase journey.

Consider a simplified conversion funnel, from a landing page through a website to completing the checkout process.


Similar to the 'did you just code a distraction?' question - we see that users move forwards to the next step at a much higher rate.  950/998 = 95%, compared to 75% for control, and 65% for Recipe B.  See, you coded a winner, it just happened to be different than the one you expected!

However, when we look further down the funnel, we find that there's a massive 50% drop-off for Recipe C.  We failed to deliver free money, and people left.  Unsurprisingly, fewer visitors then reach the lower funnel states, and Recipe B is actually the winner (with Recipe C coming behind control).

And if you think that's far-fetched, then perhaps replace 'Free money' with something more believable... like, perhaps, "Add to Basket."  Do customers really get to add the item to basket when they click that button?  Or do they have to select a size, a colour, an upgrade, a guarantee, a warranty or something else first?

Or do you promise things with your CTAs that you aren't really delivering?  "Find out more" needs to show more product information about the product it's connected to.  Getting clicks is easy, but you need to keep an eye on the next step, not just the one you're testing.

Similar posts I've written about online testing




Friday, 17 May 2024

Multi-Armed Bandit Testing

 I have worked in A/B testing for over 12 years, and blogged about it extensively.  I've covered how to set up a hypothesis, how to test iteratively and even summarized the basics of A/B testing.  I ran my first A/B test on my own website (long since deleted and now only in pieces on a local hard-drive) about 14 years ago.  However, it has taken me this long to actually look into other ways of running online A/B tests apart from the equal 50-50 split that we all know and love.

My recent research led me to discover multi-armed bandit testing, which sounds amazing, confusing and possibly risky (don't bandits wear black eye-masks and operate outside the law??). 

What is multi-armed bandit testing?

The term multi-armed bandit comes from a mathematical problem, which can be phrased like this:

A gambler must choose between multiple slot machines, or "one-armed bandits", each which has a different, unknown, likelihood of winning. The aim is to find the best or most profitable outcome by a series of choices. At the beginning of the experiment, when odds and payouts are unknown, the gambler must try each one-armed bandit to measure their payout rate, and then find a strategy to maximize winnings.  


Over time, this will mean putting more money into the machine(s) which provide the best return.

Hence, the multiple one-armed bandits make this the “multi-armed bandit problem,” from which we derive multi-armed bandit testing.

The solution - to put more money into the machine which returns the best prizes most often - translates to online testing:, the testing platform dynamically changes the allocation of new test visitors to the recipes which are showing the best performance so far.  Normally, traffic is allocated randomly between the recipes, but with multi-armed bandit testing traffic is skewed towards the winning recipe(s).  Instead of the normal 50-50 split (or 25-25-25-25, or whichever), the traffic splits on a daily (or by visit) day.  

We see two phases of traffic distribution while the test is running:  initially, we have the 'exploration' phase, where the platform tests and learns, measuring which recipe(s) are providing the best performance (insert your KPI here).  After a potential winner becomes apparent, the percentage of traffic to that recipe starts to increase, while the losers see less and less traffic.  Eventually, the winner will see the vast majority of traffic - although the platform will continue to send a very small proportion of traffic to the losers, to continue to validate its measurements, and this is the 'exploitation' phase.

The graph for the traffic distribution over time may look something like this:


...where Recipe B is the winner.

So, why do a multi-armed bandit test instead of a normal A/B test?

If you need to test, learn and implement in a short period of time, then multi-armed may be the way forwards.  For example, if marketing want to know which of two or three banners should accompany the current sales campaign (back to school; Labour Day; holiday weekend), you aren't going to have time to run the test, analyze the results and push the winner.  The campaign ended while you were tinkering with your spreadsheets.  With multi-armed bandit, the platform identifies the best recipes while the test is running, and implements it while the campaign is still active.  When the campaign has ended, you will have maximized your sales performance by showing the winner while the campaign was active.

Thursday, 25 August 2022

Testing Towards The Future State

Once or twice in the past, I've talked about how your testing program needs to align with various departments in your company if it's going to build momentum.  For example, you need to test a design that's approved by your site design and branding teams (bright orange CTA buttons might be a big winner for you, but if your brand colour is blue, you're not going to get very far).  

Or what happens if you test a design that wins but isn't approved by the IT team - they just aren't heading towards Flash animations and video clips, and they're going to start using 360-degree interactive images?  The answer - you compiled and coded a very complicated dead-end.

But what about the future state of your business model?  Are you trying to work out the best way to promote your best-selling product?  Are you testing whether showing discounts as £s off or % off?  This kind of testing assumes that pricing is important, but take a look at The Rolls Royce website which doesn't have any price information on it at all.  Scary, isn't it?  But apparently that's what a luxury brand looks like (and for a second example, try this luxury restaurant guide).

  Apart from sharing the complicated and counter-intuitive navigation of the Rolls Royce site, it also shares a distinct lack of price information.  Even the sorting and filtering excludes any kind of sorting by price - it's just not there.

So, if you're testing the best way of showing price information on your site while the business as a whole is moving towards a luxury status, then it's time to start rethinking your testing program and moving into line with the business.

Conversely, if you're moving your business model towards the mainstream audience in order to increase volumes, then it's time to start looking at pricing (for example) and making your site simpler, less ethereal and less vague, with content that's focused more on the actual features and benefits of the product, and less on the lifestyle.  Take, for example, the luxury perfume adverts that proliferate in the run-up to Christmas.  You can't convey a smell on television, or online, so instead we get these abstract adverts with people dancing on the moon; bathing in golden liquid or whatever, against a backdrop of classical music.  Does it tell you the price?  Does it tell you what it smells like?  In some cases, does it even tell you what the product is called?  Okay, it usually does, but it's a single word at the end, which they say out loud so you know how to pronounce it when you go shopping on the high street.

Compare those with, for example, toy adverts.  Simple, bright, noisy, clear images of the product, repetition of the brand and product name and with the prices (recommended retail price) running constantly throughout, and at the end.  Yes, there are legal requirements regarding toy adverts, but even so, no-one would ever think of a toy as a premium. Yet somehow, toys sell extremely well year after year, whether cheap or expensive, new or established brand.

So, make sure your testing is in line with business goals - not just KPIs, but the wider business strategy, branding and positioning. Don't go testing price presentation if the prices are being removed from your site; don't test colours of buttons which contravene your marketing guidelines for a classy monochrome site, and so on. Business goals are not always financial, so keep in touch with marketing!


Friday, 13 May 2022

Website Compromization

Test data, just like any other data, is open to interpretation.  The more KPIs you have, the more the analysis can be pointed towards one winning test recipe or another.  I've discussed this before, and used my long-suffering imaginary car salespeople to show examples of this.

Instead of a clear-cut winner, which is the best on all cases, we often find that we have to select the recipe which is the best for most of the KPIs, or the best for the main KPI, and appreciate that maybe it's not the best design overall.  Maybe the test recipe could be improved if additional design changes were made - but there isn't time to test these extra changes before the marketing team need to get their new campaign live (or the IT team need to deploy the winner in their next launch).  

Do we have enough time to actually identify the optimum design for the site?  Or the page?  Or the element we're testing?  

Anyways - is this science, or is it marketing?  Do we need to make everything on the site perfectly optimized?  Is 'better than control' good enough, or are we aiming for 'even better'?

What do we have?  Is this site optimization, a compromise, or compromization?

Or maybe you have a test result that shows that your users liked a new feature - they clicked on it, they purchased your product.  Does this sound like a success story?  It does, but only until you realise that the new feature you promoted has diverted users' attention away from your most profitable path.  To put it another way, you coded a distraction. 

For example - your new banner promotes new sports laces for your new range of running shoes... so users purchase them but spend less on the actual running shoes.  And the less expensive shoes have a lower margin, so you actually make less profit. Are you trying to sell new laces, or running shoes?

Or you have a new feature that improves the way you sort your search results, with "Featured" or "Recommend" or "Most Relevant" now serving up results that are genuinely what customers want to see.  The problem is, they're the best quality but lowest-priced products in your inventory, so your conversion rate is up by 10% but your average order value is down by 15%.  What do you do?

Are you following customer experience optimization, or compromization?

Sometimes, you'll need to compromise. You may need to sell the new range of shiny accessories with a potential loss of overall profit in order to break into a new market.  You may decide that a new feature should not be launched because although it clearly improves overall customer experience and sales volumes, it would bring down revenue by 5%.  But testing has shown what the cost of the new feature would be (and perhaps a follow-up test with some adjustments would lead to a drop in revenue of only 2%... would you take that?).    In the end, it's going to be a matter of compromization.

Tuesday, 8 December 2020

A/B testing without a 50-50 split

Whenever people ask me what I do for a living, I [try not to] launch off into a little speech about how I improve website design and experience by running tests, where we split traffic 50-50 between test and control, and mathematically determine which is better.  Over the years, it's been refined and dare I say optimized, but that's the general theme, because that's the easiest way of describing what I do.  Simple.

There is nothing in the rules, however, that says you have to split traffic 50-50.  We typically say 50-50 split because it's a random chance of being split into one of two groups - like tossing a coin, but that's just tradition (he says, tearing up the imaginary rule book).

Why might you want to test on a different split setting?

1.  Maybe your test recipe is so completely 'out-there' and different from control that you're worried that it'll affect your site's KPIs, and you want to test more cautiously.  So, why not do a 90-10?  You only risk 10% of your total traffic, and providing that 10% is large enough to produce a decent sample size, which risk a further 40%?  And if it starts winning, then maybe you increase to an 80-20 split, and move towards 50-50 eventually?

2.  Maybe your test recipe is based on a previous winner, and you want to get more of your traffic into a recipe that should be a winner as quickly as possible (while also checking that it is still a winner).  So you have the opportunity to test on a 10-90 split, with most of your traffic on the test experience and 10% held back as a control group to confirm your previous winner.

3.  Maybe you need test data quickly - you are confident you can use historic data for the control group, but you need to get data on the test page/site/experience, and for that, you'll need to funnel more traffic into the test group.  You can use a combination of historic data and control group data to measure the current state performance, and then get data on how customers interact with the new page (especially if you're measuring clicks on a new widget on the page, and how customers like or dislike it).

4.  Maybe you're running a Multi-Armed Bandit test.

Things to watch out for

If you decide to run an A/B test on uneven splits, then beware:

- You need to emphasise conversion rates, and calculate your KPIs as "per visitor" or "per impression".  I'm sure you do this already with your KPIs, but absolute numbers of orders or clicks, or revenue values will not be suitable here.  If you have twice as much traffic in B compared to A (a 66-33 split), then you should expect twice as many success events from an identical success rate; you'll need to divide by visit, visitor or page view (depending on your metric, and your choice).

- You can't do multivariate analysis on uneven splits - as I mentioned in my articles on MVT analysis, you need equal-ish numbers of visits in order to combine the data from the different recipes.

Some of my other posts on online A/B testing:

Did you just code a distraction?
Do you know how well your test will perform?
Over-specific targeting and segmentation


Tuesday, 21 May 2019

Three-Factor Multi-Variate Testing

TESTING ALL POSSIBILITIES WITHOUT TESTING EVERYTHING

My favourite part of my job is determining what to test, and planning how to run a test.  I enjoy the analysis afterwards, but the most enjoyable part of the testing process is deciding what the test recipes will actually be.  I've covered - at length - test design and planning, and also multi-variate testing.  I particularly enjoy multi-variate testing, since it simply allows you test all possibilities without having to test everything.


In my previous posts, where I introduced MVT, I've only covered two-factor MVT: should this variable be black or red?  Should it a picture of a man or a woman?  Should it say 'Special offer' or 'Limited time'?  Is it x or is it y?  How do you analyse MVT results? In this post, I'm going to take the discussion of testing one step further, and look at three-factor multi-variate testing:  should it be x, y or z?


Just as there are limited opportunities for MVT, the range of opportunities for three-factor MVT is potentially even more limited.  However, I'd like to explain that this doesn't have to be the case, and that it just takes careful planning to determine when and how to set up a test where there are three possible 'best answers'.


SCENARIO


You run a domestic travel agency, which specialises in arranging domestic travel for customers across the country (this works better if you imagine it in the US, but it works for smaller countries too).  You provide a full door-to-door service, handling everything from fuel, insurance, tickets, transfers  - whatever it takes, you can do it.  Consequently, you are in high demand around Christmas and Thanksgiving (see, I told you this worked better in the US), and potentially other holiday periods.  Yes, you're a travel agency firm based on Planes, Tranes and Automobiles.


It's the run-up to the largest sales time of the year, as you prepare to reunite distant family members across the country for those big family celebrations and parties and whatever else.  What do you lead with on your website's homepage?


Planes?

Trains?
Or automobiles?

If you want to include buses, look out for a not-yet-planned post on four-factor MVT.  I'll have it ready by Christmas.


So far, this would be a straightforward A/B/C test, with a plane, a car and a train.  Your company colours are yellow, so let's go with that:



Your marketing team are also unsure how to lead with their messaging - should they emphasise price, reliability, or an emotional connection?


They can't choose between

"Cross the country without costing the world" (price)
"Guaranteed door-to-door on time, every time" (reliability)
"Bring your smile to their doorstep this holiday" (emotional)

So now we have nine recipes, A-I.


A: Plane plus Price
B:  Plane plus reliability
C: Plane plus emotions

D: Car plus Price
E:  Car plus reliability
F: Car plus emotions


G: Train plus price
H: Train plus reliability
I: Train plus emotions


Now, somebody in the exec suite has decided that now might be the time to try out a new set of corporate colours.  Yellow is bright and cheery, but according to the exec, it can be seen as immature, and not very sophisticated.  The alternatives are red and blue (plus the original yellow).


Here goes:  there are now 3x3x3 possible variations - that's 27 altogether.  And you can't run a test with 27 recipes - for a start, there aren't enough letters in the alphabet.  There's also traffic and timing to consider - it will take months to run a test like that to get any level of significance.  Nevertheless, this is an executive request, so we'll have to make it happen.


Firstly, the visuals:  if this was just a two-variable test, then we'd have nine recipes, as you can see below.



















However, each of these vehicle/colour combinations has three more options (based on the marketing message that we select) - here is a small sample of the 27 total combinations, to give you an idea.










          
   
This is not a suitable testing set, but it gives you an idea of the total variations that we're looking at.  The next step, as we did with the more straightforward two-factor MVT, is to identify our orthogonal set - the minimum recipes that we could test that would give us sufficient information to infer the performance of the recipes that we don't test.  It's time to charge up your spreadsheet.

THE RECIPES - AN ORTHOGONAL SET

There are 3*3*3 = 27 different combinations of colour, text and vehicle... here's the list, since you're wondering ;-)



Recipe Colour Vehicle Message
A Red Plane Price
B Red Plane Reliability
C Red Plane Emotions
D Red Train Price
E Red Train Reliability
F Red Train Emotions
G Red Car Price
H Red Car Reliability
I Red Car Emotions
J Blue Plane Price
K Blue Plane Reliability
L Blue Plane Emotions
M Blue Train Price
N Blue Train Reliability
O Blue Train Emotions
P Blue Car Price
Q Blue Car Reliability
R Blue Car Emotions
S Yellow Plane Price
T Yellow Plane Reliability
U Yellow Plane Emotions
V Yellow Train Price
W Yellow Train Reliability
X Yellow Train Emotions
Y Yellow Car Price
Z Yellow Car Reliability
AA Yellow Car Emotions


The recipes with the faint green shading would form a simple orthogonal set; here they are for clarity:

Recipe Colour Vehicle Message
A Red Plane Price
E Red Train Reliability
I Red Car Emotions
K Blue Plane
Reliability
O Blue Train Emotions
P Blue Car Price
U Yellow Plane
Emotions
V Yellow Train Price
Z Yellow Car Reliability


Note that each colour, vehicle and message appear three times each; there are therefore nine recipes that we need.  This is still a considerable number, but it's a significant saving from 27 in total.

THE ANALYSIS

Which colour?  How to find the best variation for each element


Select the recipes which will give us a reading on the best colour by choosing recipes where the other variants cancel to noise:


This is simple (and simpler than the two-factor version):  we simply add the results for all the "red" recipes, and compare with the sum of all the "blue" recipes and, compare with the data for all the "yellow" recipes.


Let's take a look at some hypothetical data, based on the orthogonal recipe set shown above:


Recipe

a

e

i

k

o

p

u

v

z

Visits

1919

1922

1932

1939

1931

1934

1915

1955

1944

Bookings

193

194

189

194

205

192

200

209

206

Revenue (k)

£14.2

£14.6

£14.4

£14.3

£15.6

£13.94

£14.8

£15.7

£15.4

Conversion

10.1%

10.1%

9.8%

10.0%

10.6%

9.9%

10.4%

10.7%

10.6%

Lift

-

0.4%

-2.7%

-0.5%

5.6%

-1.3%

3.8%

6.3%

5.4%

Avg Booking Value

 £73.58

 £75.26

 £76.19

 £73.71

 £76.10

 £72.60

 £74.00

 £75.12

 £74.76

Lift - 2.3% 3.6% 0.2% 3.4% -1.3% 0.6% 2.1% 1.6%
RPV  £7.40  £7.60  £7.45  £7.37  £8.08  £7.21  £7.73  £8.03  £7.92
Lift - 2.7% 0.7% -0.3% 9.2% -2.6% 4.4% 8.5% 7.1%


I've shown the raw metrics and the calculated metrics for the recipes, but it's important to remember at this point:  the recipes shown here probably won't include the best recipe.  After all, we're testing nine recipes out of a total of 27, so we have only a one in three chance of selecting the optimum combination.
What we need to do next, as I mentioned above, is to combine the data for all the yellow recipes, and compare with the red and the blue.



Recipes
aei kop uvz
Colour
Red Blue Lift vs Red Yellow Lift vs Red
Visits
5773
5804
5814
Bookings
576
591
615
Revenue (k)
£43.2
£43.84
£45.9
Conversion
9.98%
10.18%
2.1%
10.58%
6.0%
ABV
75.00
74.18
-1.1%
74.63
-0.5%
RPV
7.48
7.55
0.9%
7.89
5.5%


So we can see from our simple colour analysis (adding all the results for the recipes which contain Red, vs Blue, vs Yellow) that Yellow is the best.  The Conversion has a 6% lift, and while Average Booking Value is slightly lower, the Revenue Per Visit is still 5.5% higher for the yellow recipes than it is for the Red.

Now we do the same for the vehicles: plane, train or car?
Recipes aku eov ipz
Vehicle Plane Train Lift
vs Plane
Car Lift
vs Plane
Visits 5773 5808 5810
Bookings 587 608 587
Revenue (k) £43.3 £45.9 £43.74
Conversion 10.17% 10.47% 3.0% 10.10% -0.6%
ABV 73.76 75.49 2.3% 74.51 1.0%
RPV 7.50 7.90 5.4% 7.53 0.4%

Clear winner in this case:  it's Train, which is the best for conversion, average booking value and revenue per visit.

And finally, the messaging:  emotional, price or reliability?

Recipes apv iou ekz
Message Price Emotion Lift vs Price Reliability Lift vs Price
Visits 5808 5778 5805
Bookings 594 594 594
Revenue 43.84 44.8 44.3
Conversion 10.23% 10.28% 0.5% 10.23% 0.1%
ABV 73.80 75.42 2.2% 74.58 1.0%
RPV 7.55 7.75 2.7% 7.63 1.1%

And in this case, it's Emotion which is the best, with clearly better average booking value and revenue.  It would appear that price is not the best way to lead your messaging.

CONCLUSION AND THOUGHTS

The best combination is:
Yellow Train, with Emotion messaging.


Notice that the performance of the recipes that we actually tested is in agreement with the winning combination (based on the calculations)


Recipes that contain none of the winning elements performed the worst:

A  - Red Plane, Price :  RPV £7.40
K - Blue Plane, Reliability:  RPV £7.37
P - Blue Car, Price  :  RPV £7.21

Recipes that contain just one of the winning elements produced slightly  better results:

E - Red Train, Reliability:  £7.60
I - Red Car, Emotions:  £7.45

Z - Yellow Car, Reliability: £7.92*

Recipes that contained two of the three winning elements were the best performers:

O - Blue Train, Emotions:  £8.08
U - Yellow Plane, Emotions:  £7.73
V - Yellow Train, Price: £8.03


I would strongly recommend running a follow-up test, with the two winners from the first selection (O and V) along with the proposed winner based on the analysis, Yellow Train with Emotions.  It's possible that this proposed winner will be the best; there's also the possibility that it may be close to but not as good as O or V. 

*There's also an argument for including Z (Yellow Car, Reliability) as an outlier, given its performance.  


There are some clear losers that do not need to be pursued:  notice how two of the bottom three performing recipes contain Blue and Price.  All of the Price recipes that we tested - A, P and V, had lower than typical Average Booking Value, and this includes recipe V, which was one of the best recipes.  With a different message (Emotions, most likely), Recipe V would be a runaway success.

It's not surprising that a follow-up is needed; remember that we've only tested nine out of 27 combinations, and it's unlikely that we'll have hit the optimum design first time around.  However, by careful selection of our original recipes, we need only test four more (at the most) to identify the best from all 27.  Finding the best combination from 27, by only testing 13 is a definite winner.  This is the power of multi-variate testing: the ability to test all possibilities without having to test everything.

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing 
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good 
Hands on:  How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!