Web Optimisation, Maths and Puzzles: July 2011

Monday 25 July 2011

Web Analytics: Who holds the steering wheel?

I'll admit now that I don't much care for Formula 1 motor racing. My brother-in-law is a massive fan, however, and on any given Sunday afternoon during the F1 season, he'll be watching it very closely while my mother-in-law puts the final touches on Sunday lunch. He's interested in the racing, following his favourite drivers and who's managed to execute the most daring overtaking maneouvre. Since it's on the TV, I'll end up watching it, and what I've found most interesting is the number of people in the team who support the driver. There's a whole squad of team members to carry out the tyre changes; refuelling; visor-wiping and so on, and another squad who spend most of the race time staring at computer screens and reports, studying them extremely closely. You'd think that having gone to the track for the race, that they'd want to watch it live, but no, they seem more interested in watching it on the TV, just like my brother-in-law and me.

I don't know exactly what they're monitoring, but I imagine there are sensors all over the car, reporting data on the car's tyre pressure and temperature; fuel load; the engine temperature; revs; speed and so on. Occasionally, the call goes out over the team radio, "You need to slow down and conserve fuel...", "Your engine is getting very hot, ease off and use fewer revs...", "Prepare for a tyre change...", "Your fuel load is fine, and you're gaining on the driver in front...", "Move over and let your team mate come past, he needs the championship points." Perhaps not the last one, but based on the screen-loads of data coming from the car, the support team are able to work out what's happening to the car, so that the driver can drive in the race.

So the call goes out, "Slow down, you're running hot and we need to get you into the pit lane." However, the driver is the one driving the car, and if he fancies his chances at a risky overtaking maneouvre, then he'll put his foot on the accelerator and get that little bit extra from the car to squeeze through on the exit of a bend. He risks overheating his engine, and possibly causing it to break completely, but he successfully overtakes his competitor.

Then the engine starts producing large clouds of blue smoke. The car starts to lose speed. It's a bit like a scene in Pixar's "Cars".

It rarely happens like this, from what I've seen. Everybody on the team wants the car to win, from the driver to the guy who stands in the pit lane and holds the stop-go lollipop, to the team manager, and everybody understands their role. If the screen-watchers see that the engine is running hot, then they have to decide how important this is - is it a show-stopper? - and then tell the driver. Ultimately, the decision lies with the driver on how to drive the car, and he hasn't got time to check all the data that the car is producing - he's the one holding the steering wheel, and he's the one with his foot on the accelerator.

Is web analytics like watching the speedometer but not having the steering wheel or the brake? As web analysts, we're responsible for reviewing the data being produced by visitors to our sites, but the task of editing a site and making changes usually falls to another team or a colleague with HTML, Java or programming skills. We can see how traffic, conversions and success metrics are changing, and we can set alerts and warnings when the figures start to move in an unwanted direction. We can send the messages to our colleague, but unless he (or she) understands what the warning means, and why it's being sent, and what to do about it, the colleague is unlikely to understand and acknowledge that action needs to be taken.

Yes, the F1 team have a few advantages to help them along: the data they have is immediate, and is understood by all the engineers (and the driver) in the team. For example, with oil temperature, there are agreed levels in place for the volume of oil and its temperature. Everybody knows what 'too hot' looks like, and they know what to do if the temperature starts to rise; the driver knows what this means and what to do about it. The team members also know what the risks are if the temperature continues to rise - will the engine start to burn oil, will the engine explode, or seize up? Is it a minor inconvenience as the cockpit temperature warms up, or is it a total show-stopper that might end the race completely?

Most web analysts don't have that level of success or failure hanging on their recommendations, but the whole team may miss out if a recommendation isn’t made. They may miss out on those incremental improvements that lead to further success, or they may let a poor-performing campaign run on for longer than it should. And the blame may not lie with the HTML team – although we may think it does, as they’re the ones who have built the site and are able to make the changes to it. The analyst spots a trend in the data, “This figure is going up week-on-week and that other figure is staying the same.” And? Or, as Avinash Kaushik puts it in his book, “So what?” Do we continue with the campaign? Do we increase our keyword bid? Do we change the page layout? It’s important – vital, even – that our data leads to a recommendation. We may not achieve the change we think is required, but without a recommendation in our insight, we’re not making it easy for the HTML to consider making the changes. What’s my recommendation? Identify the issue. Keep it brief. Make a proposal.

Does the F1 engineer come on the radio and say to the driver, “Your engine temperature has risen for the last three laps in a row, and your fuel consumption is below average.”? No, he says, “You’re overheating, slow down.” He identifies the issue, keeps it concise, and adds a recommendation for action. And now, having done that myself, I’ll stop.

Monday 18 July 2011

Web Analytics: Multi Variate Testing

MORE ABOUT MULTI VARIATE TESTING

The online sales and marketing channel is unique compared to other sales channels, because online it’s possible to test new marketing images, text and layouts and measure their effectiveness very quickly indeed, and at significantly less cost than the other channels. Performance data is readily available, can be studied and analysed, and based on these results, changes and improvements can also be tested. Not only can tests be carried out quickly and quickly and with minimal expense, but learnings from online can be used offline – for example, the best creative and message can be used in a direct mailing or a series of press or magazine adverts. This gives online marketing a significant advantage over other channels, where advertising and testing can be considerably more expensive, and where it can take much longer to obtain meaningful results from tests.

A/B testing

In order to carry out this testing effectively and with scientific accuracy, two or more different sets of creative need to be tested simultaneously, (not consecutively). This type of testing is called A/B testing, which I've discussed previously. As a recap: A/B testing, also known as Split Run Testing, is the comparative testing of two different versions of a page or a single page element. The most popular page elements that are tested are graphics and images; the offer or promotion, and call to action text.

Unless we split traffic into groups through A/B testing, we can only run different sets of creative in sequence, one after the other, and then measure the results, and this has limited usefulness and reliability. This is because external factors (such as competitor action, other marketing, current events and the economy, for example) affect the results and make a proper comparison difficult, or even impossible.

In order to carry out A/B testing, it can be very beneficial to secure the services of a third party software provider which specialises in this area. It's possible to build your own solution, and I’ve been involved in custom-written A/B tests in the past, but the availability of free solutions such as Google Website Optimiser, mean that it’s often easier and quicker to sign up for an account and start testing.

Analytics alone cannot tell the marketer how improvements (or should I say 'changes') to a web site are delivering new business. With fluctuating traffic, high sales growth and developing propositions, content improvements need to be isolated to understand what is really working. Analytics cannot directly connect site improvements to increased sales conversions while content optimisation (which I’ve referred to as iterative testing) can. By using multivariate testing, visitor segmentation, and personalisation (more on these in the future), we can optimise web site marketing communications based on real-time behaviour of customers rather than using educated guesses or relying on historical information. This leads to reduced acquisition costs; improved conversion rates and ultimately increased sales and profit.

Multi-Variate Testing (MVT)

Multi-variate testing (or optimisation) is the next step from A/B testing. MVT simultaneously tests the effect of a range of elements on a success event, and some suppliers offer a service which looks to maximise the content during the testing stage. In multi-variate testing, different combinations of elements are displayed, the combination is recorded and the visitors’ behaviour is tracked.

MVT is not simultaneous A/B testing. It means testing two or more versions of content for at least two different regions. However, A/B testing may include synergy and interaction between variables, where one headline works particularly well with one page layout, or where a colour scheme for the page supports a particular image choice.

True MVT will not only be able to test millions of content variations, it will also be able to determine the impact each variable has on conversion, by itself, and in conjunction with other variables. Testing multiple creatives in multiple areas on a page can lead to many, many possible variations. If three different page elements are to be tested (for example, headline, image and call to action), and there are two different options for each element, then this leads to eight different combinations or “recipes” that can be tested (2 x 2 x 2). Some of the recipes may work better than others because the elements are not entirely independent, and instead, they interact. Different MVT providers have different ways of handling any possible interactions between page elements, varying from considering them in detail to completely ignoring them.

In this example, there is a strong positive interaction between the image and the text in Recipe 1, and a weaker positive interaction in Recipe 4 (Volvos are very safe compared to other cars which disintegrate following high-speed contact with trees). There is a strong negative interaction between the text and the image in Recipe 2.

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we suspect will have an effect on the conversion metric) and multiple target objectives (metrics which we will be looking to improve). Site operators should focus attention on testing and optimising areas of web sites that have the highest propensity to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as hub pages; or other pages that directly influence visitor decisions.

Risks

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we think can improve the success metric) and target objectives (metrics which we will be looking to improve). There must be a focus of attention on testing and optimising areas of the site which have the highest potential to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as section heading pages; or other pages that directly influence visitor decisions (such as checkout or payment pages).

One of the key risks of setting up MVT is that an online content team will not have sufficient resource to make the testing worthwhile. The volume of work for the content developers will be increased, as they will need to design different versions of each new home page promotion (for example) that will be tested. Alternatively, some paid-for A/B testing providers provide a consultancy and design service (at a cost) to design the additional content. In order to maximise the value from an annual contract with a paid-for provider, tests need to be run as frequently as possible (while working within an iterative test program), which will require considerable resource and time.

Benefits

MVT and A/B testing is guaranteed not to worsen conversion – if all the other test versions (or ‘recipes’) perform at a lower level than the existing control version, then the control version is retained and conversion rates are not worsened. This is a powerful assurance - although it should be balanced with the alternative view that testing may serve sub-optimal content to some visitors. This guarantee means that the performance on the website is certain to improve, and that conversion, and ultimately sales performance, will be improved, especially if follow-up tests are carried out. This in turn means that MVT will provide a positive return on investment; the predicted timescales in which the testing will provide positive ROI vary from supplier to supplier (and all depend on what's being tested), but all believe that the uplift in conversion will lead to a positive ROI within six months.

This also gives analysts an opportunity to test multiple ideas in a genuine scientific environment, and to demonstrate which is best. It means that we can use numbers to test and then confirm (or, equally, disprove) our theories and ideas about which content is most effective, rather than guessing. Testing in this way enables us to better understand what works online (it won't tell us why, and that's where analysts need to start thinking), so that we can work more effectively and more efficiently in producing future content for the website. It's key to learn from tests, and move on to improving things further. Additionally, a number of the MVT software systems work out which version of the creative is working most effectively, and begin to display this more often, to improve the conversion of the page even while the test is running.

Initial outlay and costs

A number of companies provide A/B and MVT services; some will provide the alternative creative to be tested (saving us the time of developing the new creative in-house). The cost of these companies depends on the level of service that you select, and the costs vary between suppliers. I've done a little research into a small number of providers - this is only a starting point, and if you're looking at doing MVT, I would suggest carrying out further research.

Optimost, which is offered by Autonomy (previously known as Interwoven) (www.autonomy.com/optimost), have testing services which include development of strategic optimisation plans and programs prior to starting the testing. They also provide best practices recommendations based on their own prior experience. They also help with recommendations of persona definition and targeting – which leads into advice on designing the creative for the testing. As well as providing a dashboard system to review the testing and its progress, they also offer statistical analysis and interpretation of the results.

Maxymiser also typically offer a 12-month engagement. At the time of research, they charge a flat rate fee for carrying out one A/B or MVT test at a time, building up to a larger fee per year for carrying out unlimited testing on the site for one year. This includes an initial meeting to discuss plans for testing, the key areas of the site, and establishing a testing roadmap. In addition to this, they provide quarterly review sessions, and can, if requested, develop alternative content for testing (which will be signed off before going live).

Omniture’s offering, Test and Target, also works through a 12-month engagement, and being an enterprise solution are more expensive than other providers, but provide scope for unlimited tests; one day’s consultation per month, an allocated account manager and a standard level of technical support. Extra consultancy hours are available in various packages, and are charged in addition to the initial cost.

There's also Google Website Optimizer - it's free, and there's online support, which may suit self-starters (I use GWO on my Red Arrows website) and consultancy can be obtained from approved consultants.

All of the suppliers offer an online dashboard system or console, which allows users to observe the progress of the test. These vary in complexity, but are generally variations on a basic model. They show how long the test has run; which combinations of creative are working most effectively, and the degree of statistical significance (confidence) in the final result. Some providers (such as Optimost) optimise the test as they go along, rather than using the Taguchi method (which I may explain in a future post). They use “iterative waves of testing” to improve the test as it progresses. In order to do MVT, we need to be able to measure the success criteria accurately (whether they are sales, order value, CTR etc). This is done by having downstream tags on the success pages (where the success occurs).

The timescale for a test, from launch to identifying a successful ‘winner’ depends on the traffic levels required, which in turn depend on the number of recipes (the number of variations of content that can be produced). More variations means more recipes, which means more traffic in order to produce a clear winner which we can be confident in.

In order to set up an A/B or multi variate test, you will need to insert a piece of java script code in the header of the test page, and enclose each of the test areas on the page with further specific lines of code. This code enables the test system to pull in the appropriate version of the creative. The precise nature of this code varies slightly from one supplier to another, but the general principle is the same – technical code is used to track each visitor, and to determine which version of the content he is to be shown. Some providers call these ‘test areas’ or ‘mboxes’ or ‘maxyboxes’ but the principle is the same: by surrounding a part of the page (or the whole page, even) with some javascript, this enables the testing software to decide which version of the content to serve, and to track the visitor so that they see the same selected version of the content if they revisit the page.

Code will also need to be placed on the success screens to measure the success of the creative. Although placing the code in the success page is often a tricky business (it’s usually in a secure area, where deployments can be difficult to agree, arrange and co-ordinate), the advantage is that once this code has been deployed, it can be used for subsequent tests.

MVT is a very powerful tool; but having said that, so is a JCB excavator or a Black and Decker power drill. It’s important to use it wisely, with thought and consideration, and to realise that the autopilot setting is probably not the best!

Sunday 17 July 2011

Chess game: defending the Patzer's Opening

Here's an example of a game where I successfully defended the Patzer's opening, using the tactics I've listed before for negating and defeating a player who starts out with the Patzer's opening. This is also a test run of the Chess Videos Replayer software, which seems to be a success.

I was black, and playing white was k-ermin. I have to confess to making a number of blunders in this game (I might annotate them at a later date, this is really just a test run on the replayer software) but won at the end with a bishop sacrifice to clear the way for my queen to mate on a1. Now that I've found this software, I'll try and publish a few more of my more illustrative games (and not just the ones where I win, honest!).

Header tag