What makes testing iterative?
When I was about eight or nine years old, my dad began to teach me the fundamentals of BASIC programming. He'd been on a course, and learned the basics, and I was eager to learn - especially how to write games. One of the first programs he demonstrated was a simple game called Go-Karts. The screen loads up: "You are on a go-kart and the steering isn't working. You must find the right letter to operate the brakes before you crash. You have five goes. Enter letter?" You then enter a letter, and the program works out if the input is correct, or if it's before or after the letter you've entered.
"J"
"After J"
"P"
"Before P"
"L"
"L is correct - you have stopped your go-kart without crashing! Another game (Y/N)?"
"After J"
"P"
"Before P"
"L"
"L is correct - you have stopped your go-kart without crashing! Another game (Y/N)?"
I was reminded of this simple game during the Omniture Summit EMEA 2011 last week, when one of the breakout presenters started talking about testing, and in particular ITERATIVE testing. Iterative testing should be the natural development from testing, and I've alluded to it in my previous posts about testing. At its simplest, basic testing involves just comparing one version of a page, creative, banner, text or call-to-action (or whatever) against another and seeing which one works best. Iterative testing works in a similar, but more advanced way, in a way similar to my dad's Go-Karts game: start with an answer which is close to the best, and then build on that answer and start from there to develop something better still. I've talked about coloured shapes as simplified versions of page elements in my previous posts on testing, so I guess it's time to develop a different example!
Suppose I've tested the five following page headlines, and achieved the following points scores (per day), running each one for a week, so that the total test lasted five weeks.
"Cheap hi-quality widgets on sale now" - 135 points
"Discounted quality widgets available now" - 180 points
"Cheap widgets reduced now" - 110 points
"Advanced widgets available now" - 210 points
"Exclusive advanced widgets on sale now" - 200 points
"Discounted quality widgets available now" - 180 points
"Cheap widgets reduced now" - 110 points
"Advanced widgets available now" - 210 points
"Exclusive advanced widgets on sale now" - 200 points
What would you test next?
This question is the kind of open question which will help you to see if you're doing iterative testing, or just basic testing. What can we learn from the five tests that we've run so far? Anything? Or nothing? Do we have make another random guess, or can we use these results to guide us towards something that should do well?
Looking at the results from these preliminary tests, the best headline, "Advanced widgets available now" scored almost twice as many points per day as "Cheap widgets reduced now". At the very worst, we should run with this high-performing headline, which is doing marginally better than the most recent attempt, "Exclusive advanced widgets on sale now." This shouldn't pose a problem for a web development team - after all, the creative has already been designed and just needs to be re-published. All that's needed is to admit that the latest version isn't as good as an earlier idea, and to go backwards in order to go forwards.
Anyway: we can see that "Advanced..." got the best score, and is the best place to start from. We can also see that the two lowest performing headlines include the word "Cheap" so this looks like a word to avoid. From this, it looks like "Advanced widgets on sale now" and "Exclusive advanced widgets available now" are places to start from - we've eliminated the word 'cheap' and now we can look at how 'available now' compares to 'on sale now'. This is the time for developing test variations on these ideas - following the general principles that have been established by the first round of testing. This is not the time for trying a whole set of new ideas; this would mean ignoring all the potential learning and starting to make sub-optimal decisions (as they're sometimes known).
Referring back to my earlier post, this is the time in the process for making hypotheses on your data. I have to disagree with the speaker at the Omniture EMEA Summit, when she gave an example hypothesis as, "We believe that changing the page headline will drive more engagement with the site, and therefore better conversion." This is just a theory. A hypothesis says all that, and then adds, "because visitors read the page headline first when they see a page, and use that as a primary influencer to decide if the page meets their needs."
So, here's a hypothesis on the data: "Including the word 'cheap' in our headline puts visitors off because they're after premium products, not inexpensive ones. We need to focus on premium-type words because these are more attractive to our visitors." In fact - as you can see, I've even added a recommendation after my hypothesis (I couldn't resist).
And that's the foundation of iterative testing - using what's gone before and refining and improving it. Yes, it's possible that a later iteration might throw up results that are completely unexpected - and worse than before - but then that's the time to improve and refine your hypothesis. Interestingly, the less shallow hypotheses will still hold true, "We believe that changing the page headline will drive more engagement with the site, and therefore better conversion." - as it isn't specific enough.
Anyway, that's enough on iterative testing for now; I'm off to go and play my dad's second iteration of the Go-Karts game, which went something like, "You are floating down a river, and the crocodiles are gathering. You must guess how many crocodiles there are in order to escape. How many (1-50)?"