Web Optimisation, Maths and Puzzles: May 2014

Friday, 30 May 2014

Digital Analytics Hub 2014 - Preview

Next week, I'll be returning to the Digital Analytics Hub in Berlin, to once again lead discussion groups on online testing. Last year, I led discussions on "Why does yesterday's winner become today's loser?" and "Risk and Reward: Iterative vs Creative Testing." I've been invited to return this year, and I'll be discussing "Why does Average Order Value change in checkout tests?" and "Is MVT really all that great?" - the second one based on my recent blog post asking if multi-variate testing is an online panacea. I'm looking forward to catching up with some of the friends I made last year, and to meeting some new friends in the world of online analytics.

An extra bonus for me is that the Berlin InterContinental Hotel, where the conference is held, has a Chess theme for their conference centre. The merging of Chess with online testing and analytics? Something not to be missed.

The colour scheme for the rooms is very black-and-white; the rooms have names like King, Rook and Knight; there are Chess sets in every room, and each room has at least one enlarged photo print of a Chess-themed portrait. You didn't think a Chess-themed portrait was possible? Here's a sample of the pictures from last year (it's unlikely that they've been changed). From left to right, top to bottom: white bishop, white king; white king; black queen; white knight; white knight with black rook (I think).

Thursday, 29 May 2014

Changing Subject

I have never written politically before. I don't really hold strong political views, and while I vote almost every time there's an election, I don't consider myself strongly affiliated with any political party.

However, the recent statement by the Secretary of State for Eduction, Rt Hon Michael Gove, has really irritated me. He has said that he wants to reduce the range of books and literature that is studied in high schools, so that pupils will only study British authors - Shakespeare and Austen, for example. Academics and teachers have drawn particular attention to the axing of "To Kill A Mockingbird" (which I haven't studied) since it was written by an American author.

This has particular resonance for me since I'm married to an English teacher, and she was annoyed by this decision. I'm not particularly interested in English Literature - I passed my exam when I was 16, and that's it. Yes, I read - fiction and non-fiction alike - but only because I enjoy occasional reading, not because I studied literature in depth.

Quoting from the Independent newspaper's website, where they quote the Department for Education:

"In the past, English literature GCSEs were not rigorous enough and their content was often far too narrow. We published the new subject content for English literature in December."

Does anybody else find it ironic that they think reducing the scope of the literature to be studied will prevent the GCSE from becoming too narrow?

Aside from that, it occurred to me - what if this is the thin end of the wedge? What if this British-centredness is to continue throughout all the other subjects? What might they look like? As I said, I have no personal specific interest in English Literature, but I wonder if Mr Gove has plans for the rest of the syllabus. Could you imagine the way the DfE would share his latest ideas? Highlighting how strange his decision on English Literature is, here is a view of how other subjects could be affected.

The New 'British' GCSE Syllabus

Chemistry

Only the chemical elements that have been discovered by British scientists will be studied. Oxygen, hydrogen, barium, tungsten and chlorine are all out, having been discovered by the Swede Carl Wilhelm Scheele, even though other scientists published their findings first. Scottish scientist William Ramsay discovered the noble gases, so they can stay in the syllabus, and so can most of the Group 1 and 2 metals, which were isolated by Sir Humphry Davy. Lead, iron, gold and silver are all out, since they were discovered before British scientists were able to identify and isolate them. And this brings me to the next subject:

History

Only historical events pertaining to the UK are to be included in the new syllabus. The American Civil War is to be removed. The First World War I is to be reduced to a chapter, and the Second World War to a paragraph, with much more emphasis to be given to the Home Front.

Biology

Only plants and animals which are native to the UK are to be studied, because previously, science "GCSEs were not rigorous enough and their content was often far too narrow." All medicine which can be attributed to Hippocrates is out. Penicillin (Alexander Fleming) to stay in.

Maths

Fibonacci - out. da Vinci - out. Most geometry (Pythagoras, Euclid) - out. Calculus to focus exclusively on Newton, and all mention of Liebniz is to be removed. In order to aid integration with Europe, emphasis must shared between British imperial measurements and the more modern metrics which our European colleagues use.

Physics

Astronomy to be taught with the earth-centric model, since the heliocentric view of the Earth going around the Sun was devised by an Italian, Galilei Galileo. The Moon landing (American) is out. The Higgs Boson can stay, although its discovery in Switzerland is a border-line case. Gravity, having been explained by Isaac Newton, can stay in.

Foreign Languages

By their very nature, foreign languages are not British, and their study will probably not be rigorous enough, with content that's far too narrow. However, in order to aid integration with our European business colleagues and government, foreign languages are to be kept. However, this is to be limited to relevant business and economic vocabulary, and more time is to be spent learning the history of the English language instead. Preferably by rote.

Economics

In a move which follows Mr Gove's moves towards a 1940s syllabus, economics will now focus on pounds, shilling and pence. Extra maths lessons will be given to explain how the pre-decimalised system works. The modern pounds and pence system is to be studied, but only to enable pupils to understand how European exchange rates work.

Changes are not planned for 'easier' GCSEs like Media Studies; Leisure and Tourism; Hospitality or Health and Social Care, since they're being axed anyway.

So, having made a few minor tweaks to the syllabus, we now have one which Mr Gove would approve of, and which would probably be viewed by the DfE as more rigorous and less narrow. Frightening, isn't it?

Wednesday, 14 May 2014

Testing - which recipe got 197% uplift in conversion?

We've all seen them. Analytics agencies and testing software providers alike use them: the headline that says, 'our customer achieved 197% conversion lift with our product'. And with good reason. After all, if your product can give a triple-digit lift in conversion, revenue or sales, then it's something to shout about and is a great place to start a marketing campaign.

Here are a just a few quick examples:

Hyundai achieve a 62% lift in conversions by using multi-variate testing with Visual Website Optimizer.

Maxymiser show how a client achieved a 23% increase in orders

100 case studies, all showing great performance uplift

It's great. Yes, A/B testing can revolutionise your online performance and you can see amazing results. There are only really two questions left to ask: why and how?

Why did recipe B achieve a 197% lift in conversions compared to recipe A? How much effort, thought and planning went into the test? How did you achieve the uplift? Why did you measure that particular metric? Why did you test on this page? How did you choose which part of the page to test? How many hours went into the planning for the test?

There is no denying that the final results make for great headlines, and we all like to read the case studies and play spot-the-difference between the winning recipe and the defeated control recipe, but it really isn't all about the new design. It's about the behind-the-scenes work that went into the test. Which page should be tested? It's about how the design was put together; why the elements of the page were selected and why the decision that was taken to run the test. There are hours of planning; analysing data and writing a hypothesis that sit behind the good tests. Or perhaps the testing team just got lucky?

How much of this amazing uplift was down to the tool, and how much of it was due to the planning that went into using the tool? If your testing program isn't doing well, and your tests aren't showing positive results, then probably the last thing you need to look at is the tool you're using. There are a number of other things to look at first (quality of hypothesis and quality of analysis come to mind as starting points).

Let me share a story from a different situation which has some interesting parallels. There was considerable controversy around the Team GB Olympic Cycling team's performance in 2012. The GB cyclists achieved remarkable success in 2012, winning medals in almost all the events they entered. This led to some questions around the equipment they were using - the British press commented that other teams thought they were using 'magic' wheels. Dave Brailsford, the GB cycling coach during the Olympics, once joked that some of the competitors were complaining about the British team's wheels being more round.

Image: BBC

However, Dave Brailsford previously mentioned (in reviewing the team's performance in the 2008 Olympics, four years earlier) that the team's successful performances there were due to the "aggregation of marginal gains"in the design of the bikes and equipment, which is perhaps the most concise description of the role of the online testing manager. To quote again from the Team Sky website:

The skinsuit did not win Cooke the gold medal. The tyres did not win her the gold medal. Nor did her cautious negotiation of the final corner. But taken together, alongside her training and racing programme, the support from her team-mates, and her attention to many other small details, it all added up to a significant advantage - a winning advantage.
Read more at http://www.teamsky.com/article/0,27290,17547_5792058,00.html#zuO6XzKr1Q3hu87X.99

"The skinsuit did not win Cooke [GB cyclist] the gold medal. The tyres did not win her the gold medal. Nor did her cautious negotiation of the final corner. But taken together, alongside her training and racing programme, the support from her team-mates, and her attention to many other small details, it all added up to a significant advantage - a winning advantage."

It's not about wild new designs that are going to single-handedly produce 197% uplifts in performance, it's about the steady, methodical work in improving performance step by step by step, understanding what's working and what isn't, and then going on to build on those lessons. As an aside, was the original design really that bad, that it could be improved by 197% - and who approved it in the first place?

It's certainly not about the testing tool that you're using, whether it's Maxymiser, Adobe's Test and Target, or Visual Website Optimizer, or even your own in-house solution. I would be very wary of changing to a new tool just because the marketing blurb says that you should start to see 197% lift in conversion just by using it.

In conclusion, I can only point to this cartoon as a summary of what I've been saying.

Wednesday, 7 May 2014

Building Testing Program Momentum

I have written previously about getting a testing program off the ground, and selling the idea of testing to management. It's not easy, but hopefully you'll be able to start making progress and getting a few quick wins under your belt. Alternatively, you may have some seemingly disastrous tests where everything goes negative, and you wonder if you'll ever get a winner. I hope that either way, your testing program is starting to provide some business intelligence for you and your company, and that you're demonstrating the value of testing. Providing positive direction for the future is nice, providing negative direction ("don't ever implement this") is less pleasant but still useful for business.

In this article, I'd like to suggest ways of building testing momentum - i.e. starting to develop from a few ad-hoc tests into a more systematic way of testing. I've talked about iterative testing a few times now (I'm a big believer) but I'd like to offer practical advice on starting to scale up your testing efforts.

Firstly, you'll find that you need to prioritise your testing efforts. Which tests are - potentially - going to give you the best return? It's not easy to say; after all, if you knew the answer you wouldn't have to test. But look at the high traffic pages, the high entry pages (lots of traffic landing) and the major leaking points in your funnel. Fixing these pages will certainly help the business. You'll need to look at potential monetary losses for not fixing the pages (and remember that management typically pays more attention to £ and $ than they do to % uplift).

Secondly - consider the capacity of your testing team. Is your testing team made up of you, a visual designer and a single Javascript developer, or perhaps a share of development team when they can spare some capacity? There's still plenty of potential there, but plan accordingly. I've mentioned previously that there's plenty of testing opportunity available in the wording, position and colour of CTA buttons, and that you don't always need to have major design changes to see big improvements in site performance.

Thirdly - it's possible to dramatically increase the speed (and therefore capacity) of your testing program by testing in two different areas or directions at the same time. Not simultaneously, but in parallel. For example, let's suppose you want to test the call to action buttons on your product pages, and you also want to test how you show discounted prices. These should be relatively easy to design and develop - it's mostly text and colour changes that you're focusing on. Do you show the new price in green, and the original price in red? Do you add a strikethrough on the original price? What do you call the new price - "offer" or "reduced"? There's plenty to think about, and it seems everybody does it differently. And for the call-to-action button - there's wording, shape (rounded or square corners), border, arrow... the list goes on.

Now; if you want to test just call-to-action buttons, you have to develop the test (two weeks), run the test (two weeks), analyse the results (two weeks) and then develop the next test (two weeks more). This is a simplified timeline, but it shows you that you'll only be testing on your site for two weeks out of six (the other four are spent analysing and developing). Similarly, your development resource is only going to be working for two out of six weeks, and if there's capacity available, then it makes sense to use it.

I have read a little on critical path analysis (and that's it - nothing more), but it occured to me that you could double the speed of your testing program by running two mini-programs alongside each other, let's call them Track A and Track B. While Track A is testing, Track B could be in development, and then, when the test in Track A is complete, you can switch it off and launch the test in Track B. It's a little oversimplified, so here's a more plausible timeline (click for a larger image):

Start with Track A first, and design the hypothesis. Then, submit it to the development team to write the code, and when it's ready, launch the test - Test A1. While the test is running, begin on the design and hypothesis for the first test in Track B - Test B1. Then, when it's time to switch off Test A1, you can swap over and launch Test B1. That test will run, accumulating data and then, when it's complete, you can switch it off. While test B1 is running, you can review the data in test A1, work out what went well, what went badly - review the hypothesis and improve, then design the next iteration.

If everything works perfectly, you'll reach point X on my diagram and Test A2 will be ready to launch when Test B1 is switched off.

However, we live in the real world, and test A2 isn't quite as successful as it was meant to be. It takes quite some time to obtain useful data, and the conversion uplift that you anticipated has not happened - it's taking time to reach statistical significance, and so you have to keep it running for longer. Meanwhile, Test B2 is ready - you've done the analysis, submitted the new design for develoment, and the developers have completed the work. This means that test B2 is now pending. Not a problem - you're still utilising all your site traffic for testing, and that's surely an improvement on the 33% usage (two weeks testing, four weeks other activity) you had before.

Eventually, at point Y, test A2 is complete, you switch it off and launch Test B2, which has been pending for a few days/weeks. However, Test B2 is a disaster and conversion goes down very quickly; there's no option to keep it running. (If it was trending positively, then you could keep it running). Even though the next Track A test is still in development, you have got to pull the test - it's clearly hurting site performance and you need to switch it off as soon as possible.

I'm sure parallel processing has been applied in a wide range of other business projects, but this idea translates really well into the world of testing, especially if you're planning to start increasing the speed and capacity of your testing program. I will give some though to other ways of increasing test program capacity, and - hopefully - write about this in the near future.

Thursday, 1 May 2014

Iterative Testing - Follow the Numbers

Testing, as I have said before, is great. It can be adventurous, exciting and rewarding to try out new ideas for the site (especially if you're testing something that IT can't build out yet) with pie-in-the-sky designs that address every customer complaint that you've ever faced. Customers and visitors want bigger pictures, more text and clearer calls to action, with product videos, 360 degree views and a new Flash or Scene 7 interface that looks like something from Minority Report or Lost in Space.

Your new user interface, Minority Report style? Image credit

That's great - it's exciting to be involved in something futuristic and idealised, but how will it benefit the business teams who have sales targets to reach for this month, quarter or year? They will accept that some future-state testing is necessary, but will want to optimise current state, and will probably have identified some key areas from their sales and revenue data. They can see clearly where they need to focus the business's optimisation efforts and they will start synthesising their own ideas.

And this is all good news. You're reviewing your web analytics tools to look at funnels, conversion, page flow and so on; you may also have session replay and voice-of-the-customer information to wade through periodically, looking for a gem of information that will form the basis of a test hypothesis. Meanwhile, the business and sales teams have already done this (from their own angle, with their own data) and have come up with an idea.

So you run the test - you have a solid hypothesis (either from your analytics, or from the business's data) and a good idea on how to improve site performance.

But things don't go quite to plan; the results are negative, conversion is down or the average order value hasn't gone up. You carry out a thorough post-test analysis and then get everybody together to talk it through. Everbody gathers around a table (or on a call, with a screen-share ;-) - everybody turns up: the design team, the business managers, the analysts... everybody with a stake in the test, and you talk it through. Sometimes, good tests fail. Sometimes, the test wins (this is also good, but for some reason, wins never get quite as much scrutiny as losses).

And then there's the question: "Well, we did this in this test recipe, and things improved a bit, and we did that in the other test recipe and this number changed: what happens if we change this and that?" Or, "Can we run the test again, but make this change as well?"

These are great questions. As a test designer, you'll come to love these questions, especially if the idea is supported by the data. Sometimes, iterative testing isn't sequential testing towards an imagined optimum; sometimes it's brainstorming based on data. To some extent, iterative testing can be planned out in advance as a long-term strategy where you analyse a page, look at the key elements in it and address them methodically. Sometimes, iterative testing can be exciting (it's always exciting, just moreso) and take you in directions you weren't expecting. You may have thought that one part of the page (the product image, the ratings and reviews, the product blurb) was critical to the page's performance, but during the test review meeting, you find yourself asking "Can we change this and that? Can we run the test with a smaller call to action and more peer reviews?" And why not? You already have the makings of a hypothesis and the data to support it - your own test data, in fact - and you can sense that your test plan is going in the right direction (or maybe totally the wrong direction, but at least you know which way you should be going!).

It reminds me of the quote (attributed to a famous scientist, though I can't recall which one), who said, "The development of scientific theory is not like the construction of fairy castles, but more like the methodical laying of one brick on another." It's okay - in fact it's good - to have a test strategy lined up, focusing on key page elements or on page templates, but it's even more interesting when a test result throws up questions like, "Can we test X as well as Y?" or "Can we repeat the test with this additional change included?"

Follow the numbers, and see where they take you. It's a little like a dot-to-dot picture, where you're drawing the picture and plotting the new dots as you go, which is not the same as building the plane while you're flying in it ;-).

Follow the numbers. Image credit

One thing you will have to be aware of is that you are following the numbers. During the test review, you may find a colleague who wants to test their idea because it's their pet idea (recall the HIPPOthesis I've mentioned previously). Has the idea come from the data, or an interpretation of it, or has it just come totally out of the blue? Make sure you employ a filter - either during the discussion phase or afterwards - to understand if a recipe suggestion is backed by data or if it's just an idea. You'll still have to do all the prep work - and thankfully, if you're modifying and iterating, your design and development team will be grateful that they only need to make slight modifications to an existing test design.

Yes, there's scope for testing new ideas, but be aware that they're ideas, backed by intuition more than data, and are less likely (on average) to be successful; I've blogged on this before when I discussed iterating versus creating. If your testing program has limited resource (and whose doesn't?) then you'll want to focus on the test recipes that are more likely to win - and that means following the numbers.

Header tag