Web Optimisation, Maths and Puzzles

Friday, 5 August 2011

Web Analytics - A Medical Emergency

One of my favourite TV programmes at the moment is Casualty. Or perhaps it's Holby City (they're both the same, really). A typical episode unfolds with all the drama and angst between the main characters, which is suddenly broken up by the paramedics unloading a patient from an ambulance. Perhaps the patient is the victim of a fire, or a road traffic accident, or another emergency. Whatever it is, the paramedics come in, wheeling the patient along, giving a brief description of who they've got, the main symptoms, and start rattling off a list of numbers. "Patient is RTA victim, aged 56, BP is 100 over 50, pulse is 58 and weak, 100 mls of adrenaline given..." the list goes on. The senior consultant who is receiving the patient hasn't really got time to be asking questions like, "Is that bad?" and certainly not, "Is this important?" The questions he's already asking himself are, "What can we do to help this patient?" and "What's been done already?"

Regular readers will already know where I'm going with this analogy, so I'll try to keep it brief. In a life-or-death situation (and no, web analysts are hardly ever going to have that degree of responsiblity) there isn't really time to start asking and answering the trivial questions. The executive dashboard, the report or the update need to state what the results are at the moment, and how this looks against target, normal or threshold, and what action needs to be taken. The executive, in a similar way to the Formula 1 driver I mentioned last time, hasn't got time to look through all the data, decide what's important and what isn't, and what needs to be looked at.

As an aside, I should comment that reporting dying figures to an executive is likely to lead to a series of questions back to the analyst, so be ready to answer them. Better still, including a commentary that states the reasons for a change in the figures and the action that's being taken to address them. Otherwise, all you'll achieve is an unfortunate way of generating action from the content team, who probably won't be too pleased to receive a call from a member of the executive team, asking why their figures are dying, and will want to know why you didn't tell them first.

Another skill comes in determining the key figures to report - the vital statistics. The paramedics know that time is of the essence and keep it deliberately brief and to the point. No waffle. Clear. The thresholds for each KPI are already understood - after all, they have the advantage that all medical staff know what typical temperature, pulse, blood pressure and blood sugar levels are. As a web analyst (or a business analyst), you'll need to gain agreement from your stakeholders on what these are. Otherwise you may find yourself reporting the height and weight of a patient who has severe blood loss, where the metrics are meaningless and don't reflect the current situation. If you give a number for a KPI, and the reply is, "Is that a lot?" then you have some work to do - and I have some answers for you too.

Now, all I've covered so far is the reporting - the paramedics' role. If we were (or are) web reporters, then that would be the sum of our role: to look at the site, take the measurements, blurt out all the relevant figures and then go back to our desks. However, as web analysts, we now need to take on the role of the medical consultant, and start looking at the stats - the raw data - and working out why they're too high (or too low), and most importantly, what to do about them. Could you imagine the situation where the consultant identifies the cause of the problem - say an infection in the lungs - and goes over to the patient, saying, "That's fine Mr Smith, we have found the cause of your breathlessness. It's just a bacterial infection in your left lung." There would then be a hesitant pause, until the patient says something like, "Can you treat it?" or "What can you do for me?".

Good web analysts go beyond the reporting, through to identifying the cause of any problems (or, if your patient is in good health, the potential for improvements) and then working out what can be done to improve them. This takes time, and skill, and a good grasp of the web analytics tool you're using. You may have to look at your website too - actually look at the pages and see what's going on. Look at the link text; the calls to action; read the copy, and study the images. Compare this with the data you've obtained from your analytics tools. This may not provide all the answers, so you may have to persevere. Go on to look at traffic sources - the referrers, the keywords, the campaign codes. Track down the source of the problem - or the likely causes - and follow the data to its conclusion, even if it takes you outside your site to a search engine and you start trying various keywords in Google to see how your site ranks, and what your PPC actually looks like.

Checking pages on a site is just the equivalent of a doctor actually looking at his patient. He may study the screens and take a pulse and measure blood pressure or take the patient's temperature, but unless he actually looks at the patient - the patient's general appearance, any wounds, scars, marks, rashes or whatever else - he'll be guessing in the dark. This isn't House (another medical drama that I never really took to), this is real medicine. Similarly, doctors may consider environmental factors - what has the patient eaten, drunk, inhaled, come into contact with? What's going on outside the body that might affect something inside it?

There's plenty of debate about the difference between reporting and analysis - in fact I've commented on this before - but I think the easiest answer I could give now is the difference between the paramedic and the doctor. What do you think?

Transformers 3: Dark of the Moon

Having reviewed Transformers: Revenge of the Fallen some time ago, I thought it was about time I reviewed the latest Transformers film. I remained almost entirely spoiler-free before I saw the film, other than having inadvertantly seen a picture of Optimus pulling his trailer, and a picture of one of the characters who was being compared to one of the original G1 cartoon characters (I can't remember which). Being spoiler free - in fact, I even avoided the trailers for the movie - meant that I approached the film completely open-minded, although a number of people who'd seen it told me that it was significantly better than the second. I was very optimistic, and I wasn't disappointed.

There are various reasons that this film was better than the second: the parents' roles and screen time were significantly scaled down, which is a double bonus; the film was intelligently tied in to a number of 'real life' events; the number of faceless Decepticons was reduced (in fact, there were vastly more in this film film than the second, but it didn't seem like it as they were handled with intelligence); and more time and care was taken to provide the Autobots and Decepticons with identities, vehicle modes, names and even a small dose of personality - to put it another way, they had character. The film had a complicated but understandable plot with a number of twists (compared to the second film, which was boringly linear); killed off a number of characters, which I found very surprising and which developed interest in the story, especially with characters we care about; and a number of other surprises too (which you may or not predict in advance).

The plot begins with the Autobots' discovery of Cybertronian spacecraft technology in a building near the Chernobyl reactor in Ukraine; then develops to the revelation of the humans' discovery of alien technology on the far side of the moon. They refer to it in the film as 'the dark side of the moon', which is a bit of a misnomer - technically, the moon doesn't have a dark side because it turns on its axis in the same way as the Earth does, and the moon has days and nights as we do. What they really mean is the far side of the moon (as seen from Earth), but hey, "Far of the moon" doesn't have any kind of ring to it. Come to think of it, "Dark of the Moon" sounds like it's missing a word somewhere, but I suspect that Pink Floyd using "Dark Side of the Moon" in 1973 meant that Dreamworks had to leave well alone. Or perhaps the Dark of the Moon was not just the spaceship, but all the villiany and subterfuge that came from it too. Or maybe the title writers got lazy.

Along the way, we see Optimus Prime's trailer put to good use (a scene that quite obviously screams, "New toy alert!") and a batch of new Autobots who get names (I wish I could remember them). We get to see the Autobots walking on the moon, as they recover the body of Sentinel Prime - a very impressive character, voiced by the extremely impressive Leonard Nimoy. Nimoy lends the film some sci-fi credibility (as does the appearance of Buzz Aldrin), as long-time fans will remember him voicing Galvatron and Unicron in the original "Transformers The Movie" from 1985, while Trekkies will appreciate his delivery of the line, "You never understood that the needs of the many outweigh the needs of the few!" towards the end of the film. We also see robots in disguise. There are at least two scenes where vehicles which were previously assumed to be Earth vehicles and nothing more suddenly transform and engage in battle - and this was a very welcome change from the second film where we saw robots that didn't transform at all. This film definitely won on its ability to deliver surprises and shocks.

We also get some character development as Optimus and Sentinel discuss the leadership of the Autobots, and we also get to see a decrepit and suffering Megatron in another new vehicle form which befits his current situation (and again screams "New toy alert!"). The story unfolds from the discovery at Chernobyl, from Sentinel's reactivation and his change of heart, and the plot develops in dramatic and unexpected ways, as the Autobots are expelled from the Earth; the Decepticons bring in reinforcements from the Moon (and subsequently from further afield) and start their plan for world conquest.

Quite a lot of the second half of the story feels a lot like a throwback to the G1 cartoon story "The Ultimate Doom!" - in fact, large parts of the story were almost completely pulled out of that script: humans collaborate with Decepticons to build a space bridge to bring Cybertron into Earth orbit; human slaves who are co-erced into co-operating and so on. I wish I could remember if Cybertron was completely destroyed by the aborted attempt to bring it to Earth; I just know it seemed to suffer considerable damage!

On the subject of borrowed material, I can safely say I didn't notice that at least two scenes in this film were ripped directly (and I mean taken wholesale frame by frame) from some of Michael Bay's previous films, namely The Island and Pearl Harbour. It didn't affect my enjoyment, and even now I'm not bothered; seems like a clever way of reducing costs in order to put more robots on the screen for longer. And there's no complaints there: plenty of Autobots, transforming; plenty of new characters, with names and identities, vast numbers of explosions, action, fights and more explosions.

One of the down-sides for me was the stupid mechanised earthworm that was featured at the start of the film, and extensively towards the end. Does it transform? No. Does it belong in a film called Transformers? No. There is absolutely no precedent that I'm aware of in the Transformers universe for a robotic earthworm. And if it's that destructive, why didn't it completely level the skyscraper that the humans were trying to climb? Too big, too destructive, and yet somehow didn't manage to finish off the humans. Also, I do think that the final sequence was overly long and could easily have been shortened. In my view, the whole Decepticon aircraft vehicle thing, despite its jointed parts, was completely unnecessary. Transformers don't fly aircraft; they transform into them! And yet the story dictates that we have a rescue sequence that depends on Sam and Bumblebee piloting one of these vehicles: this was not a high point for me. Nor was Laserbeak's multitude of alternative forms: throughout the story, he changes forms more often than I change my socks - really not a great part of the story for me (despite what I said about robots in disguise, this was a step too far).

The main high points, in my view, were:

* Sam, arguing with the guards as he tries to enter the secret Autobot compound: "Sir, what about your car?" "That's not my car... ... ... That's my car."
* Ironhide's character arc. Won't say any more, but I was genuinely surprised at how his character developed.
* No more Megan Fox, and a fairly small amount of her replacement, who despite the wooden acting had a small but key part to play in the story, just towards the end.
* Starscream's demise at the hands of... well, yes. A very well-written set of scenes - I didn't see it coming (and neither did Starscream).

Overall - an excellent film, with outstanding special effects, good story and plot, understandable characters (and if they did just service the plot, I'm not complaining) and a body count that exceeds the previous two films put together. It remains to be seen if the Decepticon remains are going to be blasted off into space, where they might meet up with Unicron and come back re-energised, but I for one will most certainly be looking forward to the next instalment!

Monday, 25 July 2011

Web Analytics: Who holds the steering wheel?

I'll admit now that I don't much care for Formula 1 motor racing. My brother-in-law is a massive fan, however, and on any given Sunday afternoon during the F1 season, he'll be watching it very closely while my mother-in-law puts the final touches on Sunday lunch. He's interested in the racing, following his favourite drivers and who's managed to execute the most daring overtaking maneouvre. Since it's on the TV, I'll end up watching it, and what I've found most interesting is the number of people in the team who support the driver. There's a whole squad of team members to carry out the tyre changes; refuelling; visor-wiping and so on, and another squad who spend most of the race time staring at computer screens and reports, studying them extremely closely. You'd think that having gone to the track for the race, that they'd want to watch it live, but no, they seem more interested in watching it on the TV, just like my brother-in-law and me.

I don't know exactly what they're monitoring, but I imagine there are sensors all over the car, reporting data on the car's tyre pressure and temperature; fuel load; the engine temperature; revs; speed and so on. Occasionally, the call goes out over the team radio, "You need to slow down and conserve fuel...", "Your engine is getting very hot, ease off and use fewer revs...", "Prepare for a tyre change...", "Your fuel load is fine, and you're gaining on the driver in front...", "Move over and let your team mate come past, he needs the championship points." Perhaps not the last one, but based on the screen-loads of data coming from the car, the support team are able to work out what's happening to the car, so that the driver can drive in the race. Talk about having too many KPIs to monitor!

So the call goes out, "Slow down, you're running hot and we need to get you into the pit lane." However, the driver is the one driving the car, and if he fancies his chances at a risky overtaking maneouvre, then he'll put his foot on the accelerator and get that little bit extra from the car to squeeze through on the exit of a bend. He risks overheating his engine, and possibly causing it to break completely, but he successfully overtakes his competitor.

Then the engine starts producing large clouds of blue smoke. The car starts to lose speed. It's a bit like a scene in Pixar's "Cars".

It rarely happens like this, from what I've seen. Everybody on the team wants the car to win, from the driver to the guy who stands in the pit lane and holds the stop-go lollipop, to the team manager, and everybody understands their role. If the screen-watchers see that the engine is running hot, then they have to decide how important this is - is it a show-stopper? - and then tell the driver. Ultimately, the decision lies with the driver on how to drive the car, and he hasn't got time to check all the data that the car is producing - he's the one holding the steering wheel, and he's the one with his foot on the accelerator.

Is web analytics like watching the speedometer but not having the steering wheel or the brake? As web analysts, we're responsible for reviewing the data being produced by visitors to our sites, but the task of editing a site and making changes usually falls to another team or a colleague with HTML, Java or programming skills. We can see how traffic, conversions and other success metrics or KPIs are changing, and we can set alerts and warnings when the figures start to move in an unwanted direction. We can send the messages to our colleague, but unless he (or she) understands what the warning means, and why it's being sent, and what to do about it, the colleague is unlikely to understand and acknowledge that action needs to be taken.

Yes, the F1 team have a few advantages to help them along: the data they have is immediate, and is understood by all the engineers (and the driver) in the team. For example, with oil temperature, there are agreed levels in place for the volume of oil and its temperature. Everybody knows what 'too hot' looks like, and they know what to do if the temperature starts to rise; the driver knows what this means and what to do about it. The team members also know what the risks are if the temperature continues to rise - will the engine start to burn oil, will the engine explode, or seize up? Is it a minor inconvenience as the cockpit temperature warms up, or is it a total show-stopper that might end the race completely?

Most web analysts don't have that level of success or failure hanging on their recommendations, but the whole team may miss out if a recommendation isn’t made. They may miss out on those incremental improvements that lead to further success, or they may let a poor-performing campaign run on for longer than it should. And the blame may not lie with the HTML team – although we may think it does, as they’re the ones who have built the site and are able to make the changes to it. The analyst spots a trend in the data, “This figure is going up week-on-week and that other figure is staying the same.” And? Or, as Avinash Kaushik puts it in his book, “So what?” Do we continue with the campaign? Do we increase our keyword bid? Do we change the page layout? It’s important – vital, even – that our data leads to a recommendation. We may not achieve the change we think is required, but without a recommendation in our insight, we’re not making it easy for the HTML to consider making the changes. What’s my recommendation? Identify the issue. Keep it brief. Make a proposal.

Does the F1 engineer come on the radio and say to the driver, “Your engine temperature has risen for the last three laps in a row, and your fuel consumption is below average.”? No, he says, “You’re overheating, slow down.” He identifies the issue, keeps it concise, and adds a recommendation for action. And now, having done that myself, I’ll stop.

Monday, 18 July 2011

Web Analytics: Multi Variate Testing

ABOUT MULTI VARIATE TESTING

The online sales and marketing channel is unique compared to other sales channels, because online it’s possible to test new marketing images, text and layouts and measure their effectiveness very quickly indeed, and at significantly less cost than the other channels. Performance data is readily available, can be studied and analysed, and based on these results, changes and improvements can also be tested. Not only can tests be carried out quickly and quickly and with minimal expense, but learnings from online can be used offline – for example, the best creative and message can be used in a direct mailing or a series of press or magazine adverts. This gives online marketing a significant advantage over other channels, where advertising and testing can be considerably more expensive, and where it can take much longer to obtain meaningful results from tests.

A/B testing

In order to carry out this testing effectively and with scientific accuracy, two or more different sets of creative need to be tested simultaneously, (not consecutively). This type of testing is called A/B testing, which I've discussed previously. As a recap: A/B testing, also known as Split Run Testing, is the comparative testing of two different versions of a page or a single page element. The most popular page elements that are tested are graphics and images; the offer or promotion, and call to action text.

Unless we split traffic into groups through A/B testing, we can only run different sets of creative in sequence, one after the other, and then measure the results, and this has limited usefulness and reliability. This is because external factors (such as competitor action, other marketing, current events and the economy, for example) affect the results and make a proper comparison difficult, or even impossible.

In order to carry out A/B testing, it can be very beneficial to secure the services of a third party software provider which specialises in this area. It's possible to build your own solution, and I’ve been involved in custom-written A/B tests in the past, but the availability of free solutions such as Google Website Optimiser, mean that it’s often easier and quicker to sign up for an account and start testing.

Analytics alone cannot tell the marketer how improvements (or should I say 'changes') to a web site are delivering new business. With fluctuating traffic, high sales growth and developing propositions, content improvements need to be isolated to understand what is really working. Analytics cannot directly connect site improvements to increased sales conversions while content optimisation (which I’ve referred to as iterative testing) can. By using multivariate testing, visitor segmentation, and personalisation (more on these in the future), we can optimise web site marketing communications based on real-time behaviour of customers rather than using educated guesses or relying on historical information. This leads to reduced acquisition costs; improved conversion rates and ultimately increased sales and profit.

Multi-Variate Testing (MVT)

Multi-variate testing (or optimisation) is the next step from A/B testing. MVT simultaneously tests the effect of a range of elements on a success event, and some suppliers offer a service which looks to maximise the content during the testing stage. In multi-variate testing, different combinations of elements are displayed, the combination is recorded and the visitors’ behaviour is tracked.

MVT is not simultaneous A/B testing. It means testing two or more versions of content for at least two different regions. However, A/B testing may include synergy and interaction between variables, where one headline works particularly well with one page layout, or where a colour scheme for the page supports a particular image choice.

True MVT will not only be able to test millions of content variations, it will also be able to determine the impact each variable has on conversion, by itself, and in conjunction with other variables. Testing multiple creatives in multiple areas on a page can lead to many, many possible variations. If three different page elements are to be tested (for example, headline, image and call to action), and there are two different options for each element, then this leads to eight different combinations or “recipes” that can be tested (2 x 2 x 2). Some of the recipes may work better than others because the elements are not entirely independent, and instead, they interact. Different MVT providers have different ways of handling any possible interactions between page elements, varying from considering them in detail to completely ignoring them.

In this example, there is a strong positive interaction between the image and the text in Recipe 1, and a weaker positive interaction in Recipe 4 (Volvos are very safe compared to other cars which disintegrate following high-speed contact with trees). There is a strong negative interaction between the text and the image in Recipe 2.

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we suspect will have an effect on the conversion metric) and multiple target objectives (metrics which we will be looking to improve). Site operators should focus attention on testing and optimising areas of web sites that have the highest propensity to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as hub pages; or other pages that directly influence visitor decisions.

Risks

Before any optimisation exercise is carried out, a clear plan must be developed, containing a hypothesis (which elements we think can improve the success metric) and target objectives (metrics which we will be looking to improve). There must be a focus of attention on testing and optimising areas of the site which have the highest potential to positively affect users’ experience and influence conversion of high-yield activities. These could be landing pages and the home page; high traffic pages, such as section heading pages; or other pages that directly influence visitor decisions (such as checkout or payment pages).

One of the key risks of setting up MVT is that an online content team will not have sufficient resource to make the testing worthwhile. The volume of work for the content developers will be increased, as they will need to design different versions of each new home page promotion (for example) that will be tested. Alternatively, some paid-for A/B testing providers provide a consultancy and design service (at a cost) to design the additional content. In order to maximise the value from an annual contract with a paid-for provider, tests need to be run as frequently as possible (while working within an iterative test program), which will require considerable resource and time.

Benefits

MVT and A/B testing is guaranteed not to worsen conversion – if all the other test versions (or ‘recipes’) perform at a lower level than the existing control version, then the control version is retained and conversion rates are not worsened. This is a powerful assurance - although it should be balanced with the alternative view that testing may serve sub-optimal content to some visitors. This guarantee means that the performance on the website is certain to improve, and that conversion, and ultimately sales performance, will be improved, especially if follow-up tests are carried out. This in turn means that MVT will provide a positive return on investment; the predicted timescales in which the testing will provide positive ROI vary from supplier to supplier (and all depend on what's being tested), but all believe that the uplift in conversion will lead to a positive ROI within six months.

This also gives analysts an opportunity to test multiple ideas in a genuine scientific environment, and to demonstrate which is best. It means that we can use numbers to test and then confirm (or, equally, disprove) our theories and ideas about which content is most effective, rather than guessing. Testing in this way enables us to better understand what works online (it won't tell us why, and that's where analysts need to start thinking), so that we can work more effectively and more efficiently in producing future content for the website. It's key to learn from tests, and move on to improving things further. Additionally, a number of the MVT software systems work out which version of the creative is working most effectively, and begin to display this more often, to improve the conversion of the page even while the test is running.

Initial outlay and costs

A number of companies provide A/B and MVT services; some will provide the alternative creative to be tested (saving us the time of developing the new creative in-house). The cost of these companies depends on the level of service that you select, and the costs vary between suppliers. I've done a little research into a small number of providers - this is only a starting point, and if you're looking at doing MVT, I would suggest carrying out further research.

Optimost, which is offered by Autonomy (previously known as Interwoven) (www.autonomy.com/optimost), have testing services which include development of strategic optimisation plans and programs prior to starting the testing. They also provide best practices recommendations based on their own prior experience. They also help with recommendations of persona definition and targeting – which leads into advice on designing the creative for the testing. As well as providing a dashboard system to review the testing and its progress, they also offer statistical analysis and interpretation of the results.

Maxymiser also typically offer a 12-month engagement. At the time of research, they charge a flat rate fee for carrying out one A/B or MVT test at a time, building up to a larger fee per year for carrying out unlimited testing on the site for one year. This includes an initial meeting to discuss plans for testing, the key areas of the site, and establishing a testing roadmap. In addition to this, they provide quarterly review sessions, and can, if requested, develop alternative content for testing (which will be signed off before going live).

Omniture’s offering, Test and Target, also works through a 12-month engagement, and being an enterprise solution are more expensive than other providers, but provide scope for unlimited tests; one day’s consultation per month, an allocated account manager and a standard level of technical support. Extra consultancy hours are available in various packages, and are charged in addition to the initial cost.

There's also Google Website Optimizer - it's free, and there's online support, which may suit self-starters (I use GWO on my Red Arrows website) and consultancy can be obtained from approved consultants.

All of the suppliers offer an online dashboard system or console, which allows users to observe the progress of the test. These vary in complexity, but are generally variations on a basic model. They show how long the test has run; which combinations of creative are working most effectively, and the degree of statistical significance (confidence) in the final result. Some providers (such as Optimost) optimise the test as they go along, rather than using the Taguchi method (which I may explain in a future post). They use “iterative waves of testing” to improve the test as it progresses. In order to do MVT, we need to be able to measure the success criteria accurately (whether they are sales, order value, CTR etc). This is done by having downstream tags on the success pages (where the success occurs).

The timescale for a test, from launch to identifying a successful ‘winner’ depends on the traffic levels required, which in turn depend on the number of recipes (the number of variations of content that can be produced). More variations means more recipes, which means more traffic in order to produce a clear winner which we can be confident in.

In order to set up an A/B or multi variate test, you will need to insert a piece of java script code in the header of the test page, and enclose each of the test areas on the page with further specific lines of code. This code enables the test system to pull in the appropriate version of the creative. The precise nature of this code varies slightly from one supplier to another, but the general principle is the same – technical code is used to track each visitor, and to determine which version of the content he is to be shown. Some providers call these ‘test areas’ or ‘mboxes’ or ‘maxyboxes’ but the principle is the same: by surrounding a part of the page (or the whole page, even) with some javascript, this enables the testing software to decide which version of the content to serve, and to track the visitor so that they see the same selected version of the content if they revisit the page.

Code will also need to be placed on the success screens to measure the success of the creative. Although placing the code in the success page is often a tricky business (it’s usually in a secure area, where deployments can be difficult to agree, arrange and co-ordinate), the advantage is that once this code has been deployed, it can be used for subsequent tests.

MVT is a very powerful tool; but having said that, so is a JCB excavator or a Black and Decker power drill. It’s important to use it wisely, with thought and consideration, and to realise that the autopilot setting is probably not the best!

Multi Variate Testing: The Series:

Preview of Multi Variate testing
Web Analytics: Multi Variate testing (that's this article)
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - I defend MVT again...
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Sunday, 17 July 2011

Chess game: defending the Patzer's Opening

Here's an example of a game where I successfully defended the Patzer's opening, using the tactics I've listed before for negating and defeating a player who starts out with the Patzer's opening. This is also a test run of the Chess Videos Replayer software, which seems to be a success.

The Patzer's Opening, sometimes also known as the Wayward Queen Attack, is an unconventional chess opening that begins with 1. e4 e5 2. Qh5. This early queen move goes against standard opening principles, as bringing the queen out too soon can make it vulnerable to attacks.

This opening is aggressive but comes with considerable risks. White's queen immediately threatens Scholar’s Mate by targeting the f7 square, hoping for a quick checkmate. However, experienced players can easily defend against this tactic, making the move less effective against skilled opponents.

Despite its drawbacks, the opening does force Black to respond carefully, as I did in this game. A common defence is 2...Nc6, which protects the e5 pawn and prepares for rapid development. If Black plays inaccurately, White might gain an advantage, but generally, strong players consider this opening unsound.

While the Patzer's Opening might catch beginners off guard, it is rarely used in serious competitive play. Advanced players prefer openings that follow solid strategic principles, focusing on piece development and control of the centre.

In this game, I was black, and playing white was k-ermin. I have to confess to making a number of blunders in this game (I might annotate them at a later date, this is really just a test run on the replayer software) but won at the end with a bishop sacrifice to clear the way for my queen to mate on a1. Now that I've found this software, I'll try and publish a few more of my more illustrative games (and not just the ones where I win, honest!).

The Patzer Chess Series

What is the Patzer's Opening in Chess?
Defending the Patzer as Black
Another game playing the Patzer as Black

Some of my other Chess games:

My very earliest online Chess game
My most bizarre Chess game
My favourite Chess game

Thursday, 23 June 2011

Web Analytics - Intro to multi variate testing (MVT)

In my previous post, I've talked about A/B testing and in a future post, I'll cover what multi variate testing is. This post is an interim between the two; last week my wife had our second child, so blogging time is a little hard to come by at the moment!

In this brief post I'd like to list a few things that MVT is not. There seem to be various ideas about it, most of them described as a panacea for all online woes.

It isn't having more than two versions of a test image on a page; that's just A/B/C/n testing.

It isn't really about simultaneously optimising different parts of a page, either. In its purest form, MVT is about measuring and studying how changes to multiple areas of a page affect conversion, including the interactions between the parts that are changing. It's the collective sum of all the parts of the page that contribute; optimising each individual component may lead to reduced performance for the page as a whole. Taking these interactions into account, for me, is the difference between MVT and just running multiple A/B tests on one page. I'll cover this in more detail in my 'proper' post next time.

MVT isn't, by itself, the cure-all for a poor customer experience either. Setting up test versions of pages on a website won't provide long term help to a website, in the same way as a quick blast of keyword optimisation won't fix a poor Google ranking. MVT is a long-term process, and it's prone, as all computer-related activities are, to the Garbage In Garbage Out problem. If you don't think about the testing, and develop a proper testing program, then you won't learn anything or improve anything for yourself or for your site visitors.

Apologies that this post is so short, and brief; think of it as a trailer or a primer for my next post!

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - (that's this article)
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Monday, 6 June 2011

Web Analytics: A/B testing - A Beginning

In my last post on iterative testing, I gave an example which was based on sequential testing. With sequential testing, various creatives, text, headlines, images or whichever are deployed to a website for a period of time, one after another, and the relative performance of each is compared when all of them have been tried. I will, in future, discuss success metrics for these kinds of tests, but for now, in my previous posts on developing tests and building hypotheses, I've been attributing each example with a points score (it's the principle that matters).

There are numerous drawbacks with running sequential tests. The audience varies for each time period, and you can't guarantee that you'll have the same kind of traffic for each of the tests. Most importantly, though, there are various external factors that will influence the performance of each creative - for example, if one creative is running while you or a competitor is running an offline campaign, then its performance will be affected unfairly (for better or worse). There are influences such as sports events, current affairs, the weather (I've seen this ones), school holidays or national holidays that affect website traffic and audience behaviour.

Fortunately, in the online environment, it's possible to test two or more creatives against each other at the same time. Taking a simple example with two creatives, we can test them by randomly splitting your traffic into two groups and then showing one group (group A) the first creative and the other group the other creative. In the example below, one version is red, and other is blue.

The important aspect of any kind of online testing is that if a visitor comes back to the site, they have to be shown the same colour that they saw before. This is not only to ensure consistent visitor experience, but to make sure that the test remains valid. You don't want to confuse a visitor by showing him a red page and then, on a later visit, a blue one, and you won't know which version to credit with the success if he responds to one version or the other.

So, having set up two different versions of a creative (red and blue), and then splitting your visitors into two (at work we use Omniture's Test and Target, while on my own website I used Google Website Optimiser), you need to set up a success criterion. How can you tell which version (red or blue) is the better of the two? Are you going to measure click-through-rate? Or conversion to another success event, such as starting a checkout process, or completing the checkout event? There's some debate about this, but I have two personal opinions:

1. Choose a success event that's close to the page with the creative on it. Personally, I strongly recommend measuring click-through rate, but keeping an eye on conversion to other, later success events. The effect of red versus blue is almost certain to be diluted as you go further from the test page through to the success event. After all, if your checkout process is five pages long, then the effect of the red versus blue creative is going to be overtaken by the effect of what product the visitor has added to the cart, and other influences such as how efficient your checkout screens are. Yes, keep an eye on conversion through the checkout process, and make sure that the creative with the higher click-through-rate doesn't have a much lower conversion to the 'big' success events, but my view is to measure the results that are most likely to be directly influenced by your test.

2. Choose one success criterion and stick to it. If you do have more than one measure of success, then decide which is most important, and rank the rest in order of priority. I can imagine nothing more frustrating than completing a test, and presenting the results back to the great and the good, and saying, "The red version had the higher click through and the higher conversion to 'add to cart' so we've rated it the winner'", just for somebody to say, "Ah, but this one had the higher average order value, and so we should go with this one." Perhaps not the perfect example, but you can see that choosing, and agreeing on, the success criteria is very important. Otherwise, you've gone from having one person's opinion on the best creative for a web page to one person's opinion on the best success event for a test - and it may be coincidental that this person's opinion on the key success metric leads to their opinion on the most successful creative.

One comment I would make is that it's possible to test three, four or five versions of an image or creative at the same time, and this would still be called A/B testing. Technically, it'd probably be called A/B/C or A/B/C/D/E testing - usually, the shortcut is A/B/n testing, where n can be any number that suits. It's not multi-variate testing - that's a whole separate process, and not just 'multiple recipes in a test'

In my next post, I intend to write about multi-variate testing, which for me is really where science collides with web analytics. I'll explain how the results from A/B testing can be used as the basis for further testing, referring to my previous post on iterative testing, so that you can see in more detail what's working on a web page and what isn't, and what the key factors are.

Header tag