Web Optimisation, Maths and Puzzles

Friday, 5 August 2011

Transformers 3: Dark of the Moon

Having reviewed Transformers: Revenge of the Fallen some time ago, I thought it was about time I reviewed the latest Transformers film. I remained almost entirely spoiler-free before I saw the film, other than having inadvertantly seen a picture of Optimus pulling his trailer, and a picture of one of the characters who was being compared to one of the original G1 cartoon characters (I can't remember which). Being spoiler free - in fact, I even avoided the trailers for the movie - meant that I approached the film completely open-minded, although a number of people who'd seen it told me that it was significantly better than the second. I was very optimistic, and I wasn't disappointed.

There are various reasons that this film was better than the second: the parents' roles and screen time were significantly scaled down, which is a double bonus; the film was intelligently tied in to a number of 'real life' events; the number of faceless Decepticons was reduced (in fact, there were vastly more in this film film than the second, but it didn't seem like it as they were handled with intelligence); and more time and care was taken to provide the Autobots and Decepticons with identities, vehicle modes, names and even a small dose of personality - to put it another way, they had character. The film had a complicated but understandable plot with a number of twists (compared to the second film, which was boringly linear); killed off a number of characters, which I found very surprising and which developed interest in the story, especially with characters we care about; and a number of other surprises too (which you may or not predict in advance).

The plot begins with the Autobots' discovery of Cybertronian spacecraft technology in a building near the Chernobyl reactor in Ukraine; then develops to the revelation of the humans' discovery of alien technology on the far side of the moon. They refer to it in the film as 'the dark side of the moon', which is a bit of a misnomer - technically, the moon doesn't have a dark side because it turns on its axis in the same way as the Earth does, and the moon has days and nights as we do. What they really mean is the far side of the moon (as seen from Earth), but hey, "Far of the moon" doesn't have any kind of ring to it. Come to think of it, "Dark of the Moon" sounds like it's missing a word somewhere, but I suspect that Pink Floyd using "Dark Side of the Moon" in 1973 meant that Dreamworks had to leave well alone. Or perhaps the Dark of the Moon was not just the spaceship, but all the villiany and subterfuge that came from it too. Or maybe the title writers got lazy.

Along the way, we see Optimus Prime's trailer put to good use (a scene that quite obviously screams, "New toy alert!") and a batch of new Autobots who get names (I wish I could remember them). We get to see the Autobots walking on the moon, as they recover the body of Sentinel Prime - a very impressive character, voiced by the extremely impressive Leonard Nimoy. Nimoy lends the film some sci-fi credibility (as does the appearance of Buzz Aldrin), as long-time fans will remember him voicing Galvatron and Unicron in the original "Transformers The Movie" from 1985, while Trekkies will appreciate his delivery of the line, "You never understood that the needs of the many outweigh the needs of the few!" towards the end of the film. We also see robots in disguise. There are at least two scenes where vehicles which were previously assumed to be Earth vehicles and nothing more suddenly transform and engage in battle - and this was a very welcome change from the second film where we saw robots that didn't transform at all. This film definitely won on its ability to deliver surprises and shocks.

We also get some character development as Optimus and Sentinel discuss the leadership of the Autobots, and we also get to see a decrepit and suffering Megatron in another new vehicle form which befits his current situation (and again screams "New toy alert!"). The story unfolds from the discovery at Chernobyl, from Sentinel's reactivation and his change of heart, and the plot develops in dramatic and unexpected ways, as the Autobots are expelled from the Earth; the Decepticons bring in reinforcements from the Moon (and subsequently from further afield) and start their plan for world conquest.

Quite a lot of the second half of the story feels a lot like a throwback to the G1 cartoon story "The Ultimate Doom!" - in fact, large parts of the story were almost completely pulled out of that script: humans collaborate with Decepticons to build a space bridge to bring Cybertron into Earth orbit; human slaves who are co-erced into co-operating and so on. I wish I could remember if Cybertron was completely destroyed by the aborted attempt to bring it to Earth; I just know it seemed to suffer considerable damage!

On the subject of borrowed material, I can safely say I didn't notice that at least two scenes in this film were ripped directly (and I mean taken wholesale frame by frame) from some of Michael Bay's previous films, namely The Island and Pearl Harbour. It didn't affect my enjoyment, and even now I'm not bothered; seems like a clever way of reducing costs in order to put more robots on the screen for longer. And there's no complaints there: plenty of Autobots, transforming; plenty of new characters, with names and identities, vast numbers of explosions, action, fights and more explosions.

One of the down-sides for me was the stupid mechanised earthworm that was featured at the start of the film, and extensively towards the end. Does it transform? No. Does it belong in a film called Transformers? No. There is absolutely no precedent that I'm aware of in the Transformers universe for a robotic earthworm. And if it's that destructive, why didn't it completely level the skyscraper that the humans were trying to climb? Too big, too destructive, and yet somehow didn't manage to finish off the humans. Also, I do think that the final sequence was overly long and could easily have been shortened. In my view, the whole Decepticon aircraft vehicle thing, despite its jointed parts, was completely unnecessary. Transformers don't fly aircraft; they transform into them! And yet the story dictates that we have a rescue sequence that depends on Sam and Bumblebee piloting one of these vehicles: this was not a high point for me. Nor was Laserbeak's multitude of alternative forms: throughout the story, he changes forms more often than I change my socks - really not a great part of the story for me (despite what I said about robots in disguise, this was a step too far).

The main high points, in my view, were:

* Sam, arguing with the guards as he tries to enter the secret Autobot compound: "Sir, what about your car?" "That's not my car... ... ... That's my car."
* Ironhide's character arc. Won't say any more, but I was genuinely surprised at how his character developed.
* No more Megan Fox, and a fairly small amount of her replacement, who despite the wooden acting had a small but key part to play in the story, just towards the end.
* Starscream's demise at the hands of... well, yes. A very well-written set of scenes - I didn't see it coming (and neither did Starscream).

Overall - an excellent film, with outstanding special effects, good story and plot, understandable characters (and if they did just service the plot, I'm not complaining) and a body count that exceeds the previous two films put together. It remains to be seen if the Decepticon remains are going to be blasted off into space, where they might meet up with Unicron and come back re-energised, but I for one will most certainly be looking forward to the next instalment!

Monday, 25 July 2011

Web Analytics: Who holds the steering wheel?

I'll admit now that I don't much care for Formula 1 motor racing. My brother-in-law is a massive fan, however, and on any given Sunday afternoon during the F1 season, he'll be watching it very closely while my mother-in-law puts the final touches on Sunday lunch. He's interested in the racing, following his favourite drivers and who's managed to execute the most daring overtaking maneouvre. Since it's on the TV, I'll end up watching it, and what I've found most interesting is the number of people in the team who support the driver. There's a whole squad of team members to carry out the tyre changes; refuelling; visor-wiping and so on, and another squad who spend most of the race time staring at computer screens and reports, studying them extremely closely. You'd think that having gone to the track for the race, that they'd want to watch it live, but no, they seem more interested in watching it on the TV, just like my brother-in-law and me.

I don't know exactly what they're monitoring, but I imagine there are sensors all over the car, reporting data on the car's tyre pressure and temperature; fuel load; the engine temperature; revs; speed and so on. Occasionally, the call goes out over the team radio, "You need to slow down and conserve fuel...", "Your engine is getting very hot, ease off and use fewer revs...", "Prepare for a tyre change...", "Your fuel load is fine, and you're gaining on the driver in front...", "Move over and let your team mate come past, he needs the championship points." Perhaps not the last one, but based on the screen-loads of data coming from the car, the support team are able to work out what's happening to the car, so that the driver can drive in the race. Talk about having too many KPIs to monitor!

So the call goes out, "Slow down, you're running hot and we need to get you into the pit lane." However, the driver is the one driving the car, and if he fancies his chances at a risky overtaking maneouvre, then he'll put his foot on the accelerator and get that little bit extra from the car to squeeze through on the exit of a bend. He risks overheating his engine, and possibly causing it to break completely, but he successfully overtakes his competitor.

Then the engine starts producing large clouds of blue smoke. The car starts to lose speed. It's a bit like a scene in Pixar's "Cars".

It rarely happens like this, from what I've seen. Everybody on the team wants the car to win, from the driver to the guy who stands in the pit lane and holds the stop-go lollipop, to the team manager, and everybody understands their role. If the screen-watchers see that the engine is running hot, then they have to decide how important this is - is it a show-stopper? - and then tell the driver. Ultimately, the decision lies with the driver on how to drive the car, and he hasn't got time to check all the data that the car is producing - he's the one holding the steering wheel, and he's the one with his foot on the accelerator.

Is web analytics like watching the speedometer but not having the steering wheel or the brake? As web analysts, we're responsible for reviewing the data being produced by visitors to our sites, but the task of editing a site and making changes usually falls to another team or a colleague with HTML, Java or programming skills. We can see how traffic, conversions and other success metrics or KPIs are changing, and we can set alerts and warnings when the figures start to move in an unwanted direction. We can send the messages to our colleague, but unless he (or she) understands what the warning means, and why it's being sent, and what to do about it, the colleague is unlikely to understand and acknowledge that action needs to be taken.

Yes, the F1 team have a few advantages to help them along: the data they have is immediate, and is understood by all the engineers (and the driver) in the team. For example, with oil temperature, there are agreed levels in place for the volume of oil and its temperature. Everybody knows what 'too hot' looks like, and they know what to do if the temperature starts to rise; the driver knows what this means and what to do about it. The team members also know what the risks are if the temperature continues to rise - will the engine start to burn oil, will the engine explode, or seize up? Is it a minor inconvenience as the cockpit temperature warms up, or is it a total show-stopper that might end the race completely?

Most web analysts don't have that level of success or failure hanging on their recommendations, but the whole team may miss out if a recommendation isn’t made. They may miss out on those incremental improvements that lead to further success, or they may let a poor-performing campaign run on for longer than it should. And the blame may not lie with the HTML team – although we may think it does, as they’re the ones who have built the site and are able to make the changes to it. The analyst spots a trend in the data, “This figure is going up week-on-week and that other figure is staying the same.” And? Or, as Avinash Kaushik puts it in his book, “So what?” Do we continue with the campaign? Do we increase our keyword bid? Do we change the page layout? It’s important – vital, even – that our data leads to a recommendation. We may not achieve the change we think is required, but without a recommendation in our insight, we’re not making it easy for the HTML to consider making the changes. What’s my recommendation? Identify the issue. Keep it brief. Make a proposal.

Does the F1 engineer come on the radio and say to the driver, “Your engine temperature has risen for the last three laps in a row, and your fuel consumption is below average.”? No, he says, “You’re overheating, slow down.” He identifies the issue, keeps it concise, and adds a recommendation for action. And now, having done that myself, I’ll stop.

Monday, 18 July 2011

Web Analytics: Multi Variate Testing

ABOUT MULTI VARIATE TESTING

The online sales and marketing channel is unique compared to other sales channels, because online it’s possible to test new marketing images, text and layouts and measure their effectiveness very quickly indeed, and at significantly less cost than the other channels. Performance data is readily available, can be studied and analysed, and based on these results, changes and improvements can also be tested. Not only can tests be carried out quickly and quickly and with minimal expense, but learnings from online can be used offline – for example, the best creative and message can be used in a direct mailing or a series of press or magazine adverts. This gives online marketing a significant advantage over other channels, where advertising and testing can be considerably more expensive, and where it can take much longer to obtain meaningful results from tests.

A/B testing

In order to carry out this testing effectively and with scientific accuracy, two or more different sets of creative need to be tested simultaneously, (not consecutively). This type of testing is called A/B testing, which I've discussed previously. As a recap: A/B testing, also known as Split Run Testing, is the comparative testing of two different versions of a page or a single page element. The most popular page elements that are tested are graphics and images; the offer or promotion, and call to action text.

Unless we split traffic into groups through A/B testing, we can only run different sets of creative in sequence, one after the other, and then measure the results, and this has limited usefulness and reliability. This is because external factors (such as competitor action, other marketing, current events and the economy, for example) affect the results and make a proper comparison difficult, or even impossible.

In order to carry out A/B testing, it can be very beneficial to secure the services of a third party software provider which specialises in this area. It's possible to build your own solution, and I’ve been involved in custom-written A/B tests in the past, but the availability of free solutions such as Google Website Optimiser, mean that it’s often easier and quicker to sign up for an account and start testing.

Analytics alone cannot tell the marketer how improvements (or should I say 'changes') to a web site are delivering new business. With fluctuating traffic, high sales growth and developing propositions, content improvements need to be isolated to understand what is really working. Analytics cannot directly connect site improvements to increased sales conversions while content optimisation (which I’ve referred to as iterative testing) can. By using multivariate testing, visitor segmentation, and personalisation (more on these in the future), we can optimise web site marketing communications based on real-time behaviour of customers rather than using educated guesses or relying on historical information. This leads to reduced acquisition costs; improved conversion rates and ultimately increased sales and profit.

Multi-Variate Testing (MVT)

Multi-variate testing (or optimisation) is the next step from A/B testing. MVT simultaneously tests the effect of a range of elements on a success event, and some suppliers offer a service which looks to maximise the content during the testing stage. In multi-variate testing, different combinations of elements are displayed, the combination is recorded and the visitors’ behaviour is tracked.

MVT is not simultaneous A/B testing. It means testing two or more versions of content for at least two different regions. However, A/B testing may include synergy and interaction between variables, where one headline works particularly well with one page layout, or where a colour scheme for the page supports a particular image choice.

True MVT will not only be able to test millions of content variations, it will also be able to determine the impact each variable has on conversion, by itself, and in conjunction with other variables. Testing multiple creatives in multiple areas on a page can lead to many, many possible variations. If three different page elements are to be tested (for example, headline, image and call to action), and there are two different options for each element, then this leads to eight different combinations or “recipes” that can be tested (2 x 2 x 2). Some of the recipes may work better than others because the elements are not entirely independent, and instead, they interact. Different MVT providers have different ways of handling any possible interactions between page elements, varying from considering them in detail to completely ignoring them.

In this example, there is a strong positive interaction between the image and the text in Recipe 1, and a weaker positive interaction in Recipe 4 (Volvos are very safe compared to other cars which disintegrate following high-speed contact with trees). There is a strong negative interaction between the text and the image in Recipe 2.

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we suspect will have an effect on the conversion metric) and multiple target objectives (metrics which we will be looking to improve). Site operators should focus attention on testing and optimising areas of web sites that have the highest propensity to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as hub pages; or other pages that directly influence visitor decisions.

Risks

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we think can improve the success metric) and target objectives (metrics which we will be looking to improve). There must be a focus of attention on testing and optimising areas of the site which have the highest potential to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as section heading pages; or other pages that directly influence visitor decisions (such as checkout or payment pages).

One of the key risks of setting up MVT is that an online content team will not have sufficient resource to make the testing worthwhile. The volume of work for the content developers will be increased, as they will need to design different versions of each new home page promotion (for example) that will be tested. Alternatively, some paid-for A/B testing providers provide a consultancy and design service (at a cost) to design the additional content. In order to maximise the value from an annual contract with a paid-for provider, tests need to be run as frequently as possible (while working within an iterative test program), which will require considerable resource and time.

Benefits

MVT and A/B testing is guaranteed not to worsen conversion – if all the other test versions (or ‘recipes’) perform at a lower level than the existing control version, then the control version is retained and conversion rates are not worsened. This is a powerful assurance - although it should be balanced with the alternative view that testing may serve sub-optimal content to some visitors. This guarantee means that the performance on the website is certain to improve, and that conversion, and ultimately sales performance, will be improved, especially if follow-up tests are carried out. This in turn means that MVT will provide a positive return on investment; the predicted timescales in which the testing will provide positive ROI vary from supplier to supplier (and all depend on what's being tested), but all believe that the uplift in conversion will lead to a positive ROI within six months.

This also gives analysts an opportunity to test multiple ideas in a genuine scientific environment, and to demonstrate which is best. It means that we can use numbers to test and then confirm (or, equally, disprove) our theories and ideas about which content is most effective, rather than guessing. Testing in this way enables us to better understand what works online (it won't tell us why, and that's where analysts need to start thinking), so that we can work more effectively and more efficiently in producing future content for the website. It's key to learn from tests, and move on to improving things further. Additionally, a number of the MVT software systems work out which version of the creative is working most effectively, and begin to display this more often, to improve the conversion of the page even while the test is running.

Initial outlay and costs

A number of companies provide A/B and MVT services; some will provide the alternative creative to be tested (saving us the time of developing the new creative in-house). The cost of these companies depends on the level of service that you select, and the costs vary between suppliers. I've done a little research into a small number of providers - this is only a starting point, and if you're looking at doing MVT, I would suggest carrying out further research.

Optimost, which is offered by Autonomy (previously known as Interwoven) (www.autonomy.com/optimost), have testing services which include development of strategic optimisation plans and programs prior to starting the testing. They also provide best practices recommendations based on their own prior experience. They also help with recommendations of persona definition and targeting – which leads into advice on designing the creative for the testing. As well as providing a dashboard system to review the testing and its progress, they also offer statistical analysis and interpretation of the results.

Maxymiser also typically offer a 12-month engagement. At the time of research, they charge a flat rate fee for carrying out one A/B or MVT test at a time, building up to a larger fee per year for carrying out unlimited testing on the site for one year. This includes an initial meeting to discuss plans for testing, the key areas of the site, and establishing a testing roadmap. In addition to this, they provide quarterly review sessions, and can, if requested, develop alternative content for testing (which will be signed off before going live).

Omniture’s offering, Test and Target, also works through a 12-month engagement, and being an enterprise solution are more expensive than other providers, but provide scope for unlimited tests; one day’s consultation per month, an allocated account manager and a standard level of technical support. Extra consultancy hours are available in various packages, and are charged in addition to the initial cost.

There's also Google Website Optimizer - it's free, and there's online support, which may suit self-starters (I use GWO on my Red Arrows website) and consultancy can be obtained from approved consultants.

All of the suppliers offer an online dashboard system or console, which allows users to observe the progress of the test. These vary in complexity, but are generally variations on a basic model. They show how long the test has run; which combinations of creative are working most effectively, and the degree of statistical significance (confidence) in the final result. Some providers (such as Optimost) optimise the test as they go along, rather than using the Taguchi method (which I may explain in a future post). They use “iterative waves of testing” to improve the test as it progresses. In order to do MVT, we need to be able to measure the success criteria accurately (whether they are sales, order value, CTR etc). This is done by having downstream tags on the success pages (where the success occurs).

The timescale for a test, from launch to identifying a successful ‘winner’ depends on the traffic levels required, which in turn depend on the number of recipes (the number of variations of content that can be produced). More variations means more recipes, which means more traffic in order to produce a clear winner which we can be confident in.

In order to set up an A/B or multi variate test, you will need to insert a piece of java script code in the header of the test page, and enclose each of the test areas on the page with further specific lines of code. This code enables the test system to pull in the appropriate version of the creative. The precise nature of this code varies slightly from one supplier to another, but the general principle is the same – technical code is used to track each visitor, and to determine which version of the content he is to be shown. Some providers call these ‘test areas’ or ‘mboxes’ or ‘maxyboxes’ but the principle is the same: by surrounding a part of the page (or the whole page, even) with some javascript, this enables the testing software to decide which version of the content to serve, and to track the visitor so that they see the same selected version of the content if they revisit the page.

Code will also need to be placed on the success screens to measure the success of the creative. Although placing the code in the success page is often a tricky business (it’s usually in a secure area, where deployments can be difficult to agree, arrange and co-ordinate), the advantage is that once this code has been deployed, it can be used for subsequent tests.

MVT is a very powerful tool; but having said that, so is a JCB excavator or a Black and Decker power drill. It’s important to use it wisely, with thought and consideration, and to realise that the autopilot setting is probably not the best!

Multi Variate Testing: The Series:

Preview of Multi Variate testing
Web Analytics: Multi Variate testing (that's this article)
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - I defend MVT again...
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Sunday, 17 July 2011

Chess game: defending the Patzer's Opening

Here's an example of a game where I successfully defended the Patzer's opening, using the tactics I've listed before for negating and defeating a player who starts out with the Patzer's opening. This is also a test run of the Chess Videos Replayer software, which seems to be a success.

The Patzer's Opening, sometimes also known as the Wayward Queen Attack, is an unconventional chess opening that begins with 1. e4 e5 2. Qh5. This early queen move goes against standard opening principles, as bringing the queen out too soon can make it vulnerable to attacks.

This opening is aggressive but comes with considerable risks. White's queen immediately threatens Scholar’s Mate by targeting the f7 square, hoping for a quick checkmate. However, experienced players can easily defend against this tactic, making the move less effective against skilled opponents.

Despite its drawbacks, the opening does force Black to respond carefully, as I did in this game. A common defence is 2...Nc6, which protects the e5 pawn and prepares for rapid development. If Black plays inaccurately, White might gain an advantage, but generally, strong players consider this opening unsound.

While the Patzer's Opening might catch beginners off guard, it is rarely used in serious competitive play. Advanced players prefer openings that follow solid strategic principles, focusing on piece development and control of the centre.

In this game, I was black, and playing white was k-ermin. I have to confess to making a number of blunders in this game (I might annotate them at a later date, this is really just a test run on the replayer software) but won at the end with a bishop sacrifice to clear the way for my queen to mate on a1. Now that I've found this software, I'll try and publish a few more of my more illustrative games (and not just the ones where I win, honest!).

The Patzer Chess Series

What is the Patzer's Opening in Chess?
Defending the Patzer as Black
Another game playing the Patzer as Black

Some of my other Chess games:

My very earliest online Chess game
My most bizarre Chess game
My favourite Chess game

Thursday, 23 June 2011

Web Analytics - Intro to multi variate testing (MVT)

In my previous post, I've talked about A/B testing and in a future post, I'll cover what multi variate testing is. This post is an interim between the two; last week my wife had our second child, so blogging time is a little hard to come by at the moment!

In this brief post I'd like to list a few things that MVT is not. There seem to be various ideas about it, most of them described as a panacea for all online woes.

It isn't having more than two versions of a test image on a page; that's just A/B/C/n testing.

It isn't really about simultaneously optimising different parts of a page, either. In its purest form, MVT is about measuring and studying how changes to multiple areas of a page affect conversion, including the interactions between the parts that are changing. It's the collective sum of all the parts of the page that contribute; optimising each individual component may lead to reduced performance for the page as a whole. Taking these interactions into account, for me, is the difference between MVT and just running multiple A/B tests on one page. I'll cover this in more detail in my 'proper' post next time.

MVT isn't, by itself, the cure-all for a poor customer experience either. Setting up test versions of pages on a website won't provide long term help to a website, in the same way as a quick blast of keyword optimisation won't fix a poor Google ranking. MVT is a long-term process, and it's prone, as all computer-related activities are, to the Garbage In Garbage Out problem. If you don't think about the testing, and develop a proper testing program, then you won't learn anything or improve anything for yourself or for your site visitors.

Apologies that this post is so short, and brief; think of it as a trailer or a primer for my next post!

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - (that's this article)
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Monday, 6 June 2011

Web Analytics: A/B testing - A Beginning

In my last post on iterative testing, I gave an example which was based on sequential testing. With sequential testing, various creatives, text, headlines, images or whichever are deployed to a website for a period of time, one after another, and the relative performance of each is compared when all of them have been tried. I will, in future, discuss success metrics for these kinds of tests, but for now, in my previous posts on developing tests and building hypotheses, I've been attributing each example with a points score (it's the principle that matters).

There are numerous drawbacks with running sequential tests. The audience varies for each time period, and you can't guarantee that you'll have the same kind of traffic for each of the tests. Most importantly, though, there are various external factors that will influence the performance of each creative - for example, if one creative is running while you or a competitor is running an offline campaign, then its performance will be affected unfairly (for better or worse). There are influences such as sports events, current affairs, the weather (I've seen this ones), school holidays or national holidays that affect website traffic and audience behaviour.

Fortunately, in the online environment, it's possible to test two or more creatives against each other at the same time. Taking a simple example with two creatives, we can test them by randomly splitting your traffic into two groups and then showing one group (group A) the first creative and the other group the other creative. In the example below, one version is red, and other is blue.

The important aspect of any kind of online testing is that if a visitor comes back to the site, they have to be shown the same colour that they saw before. This is not only to ensure consistent visitor experience, but to make sure that the test remains valid. You don't want to confuse a visitor by showing him a red page and then, on a later visit, a blue one, and you won't know which version to credit with the success if he responds to one version or the other.

So, having set up two different versions of a creative (red and blue), and then splitting your visitors into two (at work we use Omniture's Test and Target, while on my own website I used Google Website Optimiser), you need to set up a success criterion. How can you tell which version (red or blue) is the better of the two? Are you going to measure click-through-rate? Or conversion to another success event, such as starting a checkout process, or completing the checkout event? There's some debate about this, but I have two personal opinions:

1. Choose a success event that's close to the page with the creative on it. Personally, I strongly recommend measuring click-through rate, but keeping an eye on conversion to other, later success events. The effect of red versus blue is almost certain to be diluted as you go further from the test page through to the success event. After all, if your checkout process is five pages long, then the effect of the red versus blue creative is going to be overtaken by the effect of what product the visitor has added to the cart, and other influences such as how efficient your checkout screens are. Yes, keep an eye on conversion through the checkout process, and make sure that the creative with the higher click-through-rate doesn't have a much lower conversion to the 'big' success events, but my view is to measure the results that are most likely to be directly influenced by your test.

2. Choose one success criterion and stick to it. If you do have more than one measure of success, then decide which is most important, and rank the rest in order of priority. I can imagine nothing more frustrating than completing a test, and presenting the results back to the great and the good, and saying, "The red version had the higher click through and the higher conversion to 'add to cart' so we've rated it the winner'", just for somebody to say, "Ah, but this one had the higher average order value, and so we should go with this one." Perhaps not the perfect example, but you can see that choosing, and agreeing on, the success criteria is very important. Otherwise, you've gone from having one person's opinion on the best creative for a web page to one person's opinion on the best success event for a test - and it may be coincidental that this person's opinion on the key success metric leads to their opinion on the most successful creative.

One comment I would make is that it's possible to test three, four or five versions of an image or creative at the same time, and this would still be called A/B testing. Technically, it'd probably be called A/B/C or A/B/C/D/E testing - usually, the shortcut is A/B/n testing, where n can be any number that suits. It's not multi-variate testing - that's a whole separate process, and not just 'multiple recipes in a test'

In my next post, I intend to write about multi-variate testing, which for me is really where science collides with web analytics. I'll explain how the results from A/B testing can be used as the basis for further testing, referring to my previous post on iterative testing, so that you can see in more detail what's working on a web page and what isn't, and what the key factors are.

Tuesday, 31 May 2011

Web Analytics: What makes testing iterative?

What makes testing iterative?

When I was about eight or nine years old, my dad began to teach me the fundamentals of BASIC programming. He'd been on a course, and learned the basics, and I was eager to learn - especially how to write games. One of the first programs he demonstrated was a simple game called Go-Karts. The screen loads up: "You are on a go-kart and the steering isn't working. You must find the right letter to operate the brakes before you crash. You have five goes. Enter letter?" You then enter a letter, and the program works out if the input is correct, or if it's before or after the letter you've entered.

"J"
"After J"
"P"
"Before P"
"L"
"L is correct - you have stopped your go-kart without crashing! Another game (Y/N)?"

I was reminded of this simple game during the Omniture Summit EMEA 2011 last week, when one of the breakout presenters started talking about testing, and in particular ITERATIVE testing. Iterative testing should be the natural development from testing, and I've alluded to it in my previous posts about testing. At its simplest, basic testing involves just comparing one version of a page, creative, banner, text or call-to-action (or whatever) against another and seeing which one works best. Iterative testing works in a similar, but more advanced way, in a way similar to my dad's Go-Karts game: start with an answer which is close to the best, and then build on that answer and start from there to develop something better still. I've talked about coloured shapes as simplified versions of page elements in my previous posts on testing, so I guess it's time to develop a different example!

Suppose I've tested the five following page headlines, and achieved the following points scores (per day), running each one for a week, so that the total test lasted five weeks.

"Cheap hi-quality widgets on sale now" - 135 points
"Discounted quality widgets available now" - 180 points
"Cheap widgets reduced now" - 110 points
"Advanced widgets available now" - 210 points
"Exclusive advanced widgets on sale now" - 200 points

What would you test next?

This question is the kind of open question which will help you to see if you're doing iterative testing, or just basic testing. What can we learn from the five tests that we've run so far? Anything? Or nothing? Do we have make another random guess, or can we use these results to guide us towards something that should do well?

Looking at the results from these preliminary tests, the best headline, "Advanced widgets available now" scored almost twice as many points per day as "Cheap widgets reduced now". At the very worst, we should run with this high-performing headline, which is doing marginally better than the most recent attempt, "Exclusive advanced widgets on sale now." This shouldn't pose a problem for a web development team - after all, the creative has already been designed and just needs to be re-published. All that's needed is to admit that the latest version isn't as good as an earlier idea, and to go backwards in order to go forwards.

Anyway: we can see that "Advanced..." got the best score, and is the best place to start from. We can also see that the two lowest performing headlines include the word "Cheap" so this looks like a word to avoid. From this, it looks like "Advanced widgets on sale now" and "Exclusive advanced widgets available now" are places to start from - we've eliminated the word 'cheap' and now we can look at how 'available now' compares to 'on sale now'. This is the time for developing test variations on these ideas - following the general principles that have been established by the first round of testing. This is not the time for trying a whole set of new ideas; this would mean ignoring all the potential learning and starting to make sub-optimal decisions (as they're sometimes known).

Referring back to my earlier post, this is the time in the process for making hypotheses on your data. I have to disagree with the speaker at the Omniture EMEA Summit, when she gave an example hypothesis as, "We believe that changing the page headline will drive more engagement with the site, and therefore better conversion." This is just a theory. A hypothesis says all that, and then adds, "because visitors read the page headline first when they see a page, and use that as a primary influencer to decide if the page meets their needs."

So, here's a hypothesis on the data: "Including the word 'cheap' in our headline puts visitors off because they're after premium products, not inexpensive ones. We need to focus on premium-type words because these are more attractive to our visitors." In fact - as you can see, I've even added a recommendation after my hypothesis (I couldn't resist).

And that's the foundation of iterative testing - using what's gone before and refining and improving it. Yes, it's possible that a later iteration might throw up results that are completely unexpected - and worse than before - but then that's the time to improve and refine your hypothesis. Interestingly, the less shallow hypotheses will still hold true, "We believe that changing the page headline will drive more engagement with the site, and therefore better conversion." - as it isn't specific enough.

Anyway, that's enough on iterative testing for now; I'm off to go and play my dad's second iteration of the Go-Karts game, which went something like, "You are floating down a river, and the crocodiles are gathering. You must guess how many crocodiles there are in order to escape. How many (1-50)?"

Header tag