Header tag

Tuesday 27 January 2015

Pitfalls of Online Optimisation and Testing 1: Are your results really flat?

I've previously covered the trials of starting and maintaining an online optimisation program, and once you've reached a critical mass it seems as if the difficulties are over and it's plain sailing. Each test provides valuable business insights, yields a conversion lift (or points to a future opportunity) and you've reached a virtuous cycle of testing and learning. Except when it doesn't. There are some key pitfalls to avoid, or, having hit them, to conquer.

1. Obtaining flat results (a draw)
2. Too little meaningful data
3. Misunderstanding discrete versus continuous testing

The largest ever score draw in English football was a 5-5 draw between West Bromwich Albion and Manchester United in May 2013.  Just last weekend, the same mighty Manchester United were held to a goalless draw by Cambridge United, a team which is two divisions below them in the English league, in an FA Cup match.  Are these games the same? Are the two sides really equal? In both games, both teams performed equally, so on face value you would think they are (and perhaps they are; Manchester United are really not having a great season).  It's time to consider the underlying data to really extract an unbiased and fuller story of what happened (the Cambridge press recorded it as a great draw, one Manchester-based website saw it slightly differently).

Let's look at the recent match between Cambridge and Manchester United, borrowing a diagram from Cambridge United's official website.

One thing is immediately clear:  Cambridge didn't score any goals because they didnt' get a single shot on target.  Manchester United, on the other hand, had five shots on target but a further ten that missed - only 33% of shots were heading for the goal.  Analysis of the game would probably indicate that these were long-range shots as Cambridge kept Manchester at a 'safe' distance from their goal.  Although this game was a goalless draw, it's clear that the two sides have different issues that they need to address if they are to score in the replay next week.

Now let's look at the high-scoring draw between West Brom and Man Utd.  Which team was better and which was lucky to get a single point from the game? In each case, it would also be beneficial to analyse how each of the ten goals was scored - that's ten goals (one every nine minutes on average) which is invaluable data compared to the goalless draw.

The image on the right is borrowed from the Guardian's website, and shows the key metrics for the game (I've discussed key metrics in football matches before).  What can we conclude?

- it was a close match, with both team seeing similar levels of ball possession.

- West Brom acheived 15 shots in total, compared to just 12 for Man Utd

- If West Brom had been able to improve the quality and accuracy of their goal attempts, they may have won the game. 

- For Man Utd, the problem was not the quality of their goal attempts (they had 66% accuracy, compared to just over 50% for West Bromwich) but the quantity of them.  Their focus should be creating more shooting opportunities.


- As a secondary metric, West Brom should probably look at the causes for all those fouls.  I didn't see the game directly, but further analysis and study would indicate what happened there, and how the situation could be improved.


There is a tendency to analyse our losing tests to find out why they lost (if only so we can explain it to our managers), and with thorough planning and a solid hypothesis we should be able to identify why a test did not do well.  It's also human nature to briefly review our winners so that we can see if we can do even better in future.  But draws? They get ignored and forgotten - the test recipe had no impact and is not worth pursuing. Additionally, it didn't lose, so we don't apply the same level of scrutiny that we would if it had suffered a disastrous defeat. If wins are green and losers are red, then somehow the draws just fade to grey.  However, it shouldn't be the case.

So what should we look for in our test data?  Firstly - revisit the hypothesis.  You expected to see an overall improvement in a particular metric, but that didn't happen: was this because something happened in the pages between the test page and the success page?  For example, did you reduce a page's exit rate by apparently improving the relevance of the page's banners, only to lose all the clickers on the very next page instead - the net result is that order conversion is flat, but the story needs to be told more thoroughly. Think about how Manchester United and Cambridge United need different strategies to improve their performance in the next match.

But what if absolutely all the metrics are flat?  There's no real change in exit rate, bounce rate, click through rate, time on page... any other page metric or sales figure you care to mention?  It is quite likely, that the test you've run was not significant enough. The change in wording, colour, design or banner that you made just wasn't dramatic enough to affect your visitors' perceptions and intentions. There may still be something useful to learn from this: your visitors aren't bothered if your banners feature pictures of your product or a family photo; or a picture of a single person or a group of people... or whichever it may be.  

FA Cup matches have the advantage of a replay before there's extra time and penalties (the first may be an option for a flat test, the second sounds interesting!), so we're guaranteed a re-test, more data and a definite result in the end - something we can all look for in our tests.