Web Optimisation, Maths and Puzzles

Monday, 3 March 2014

Chess: Ruy Lopez Exchange Variation

Some games are classics. Some are so disastrously filled with blunders that the only way you're going to win is by committing fewer blunders and, ideally, not be the last person to commit one.

This is one of those games. It started off well enough, with the Ruy Lopez Exchange Variation (I knew a few moves, which got me started), but then the game descended into a number of unusual moves. Or blunders. And I missed at least one key opportunity to secure a big win... again.

Here goes.

Dave Johnson vs David Leese, Kidsgrove Chess Club, Roy Bennett Cup. 25 Feb 2014

1. e4 e5

2. Nf3 Nc6

3. Bb4 a6

4. Bxc6 dxc6

So far, soo good. All by-the-book. I've had to recapture with my d-pawn (away from the centre) to ensure I don't lose the e-pawn (I can meet Nxe5 with Qd4, forking pawn and knight and subsequently regaining the pawn).

5. d3 Bd6 (White plays to protect his e-pawn, so the threat of Nxe5 is back on. I protect my e-pawn with my bishop, which seems okay to me.)

6. O-O Bg4

7. d4 exd4 (White's move surprised me. What's he doing, moving a pawn twice during development? I've pinned his knight on f3, so I am eyeing up the possibility of doubling his f- and g-pawns with an exchange).
8. Qxd4 Bxf3 (... so far, so good, it seems, all going according to my plan).

9. Qxg7 Qh4 (I missed Qxg7. But I've decided I'm not playing cautiously, and if I can't save my rook, I'm going to get some counterplay out of it. Here, I'm threatening Qxh2#, and I've decided that this game is not going to be a draw).

At this stage, I'm thinking of various threats, apart from Qxh2#. I'm anticipating various defences, including 10. h3, where I'll then play Bxg2 and possibly Qxe4. I really need to think ahead (I was playing on adrenaline, which is never a good idea) and I didn't see White's defence (which was also a mistake).

10. e5? Be2?

An error from White, followed by a massive, massive mistake by me. After 10. e5, the best answer was Bf8, threatening all kinds of nastiness. Here's the position after 10. ... Bf8, the best move for Black.

Note that Black is currently a piece up (White has not recaptured on f3, as he hasn't had time and opted to capture on g7 instead). And his Queen is attacked, so he can't recapture on f3 just yet either. I wish I'd seen this move at the time - I was too busy working out how to save my pieces, and this was the obvious answer: if I can't threaten the King, then chase the Queen. Here, White can't capture the rook, because Qxg8 falls to ... Qg4, and after g3, Qh3 has no reply. White's only answer here is Qg5, and after an exchange of queens and Black's Bd5 or Be4, Black is a piece up for a pawn (although his pawn structure is a mess).

But no, I played Be2. I moved my bishop away from its prime location in front of White's king, and attacked the rook instead... it's become a desperado. In fact, both bishops have ... what am I thinking?

11. exd6 Bxf1

12. Qxh8 O-O-O

I have a plan here, working around Kxf1, Qc4+, and then either Re8+ or Qxc2 with various threats. However, White isn't bothered with the Bishop at the moment. I should probably have played Qg4 to press the issue (threatening Qxg2#) and force the capture. Did I see this? No.

13. Qe5 Rxd6 (here comes the rook...)

14. Nc3 Rf6? (White plays Nc3, developing the knight and denying my rook the d1 square. I decide, after some thought, to threaten checkmate - the threat is Qxf2 and the Qxg2#).

15. Qe8#

If there's only one thing quicker than 'checkmate next move', it's 'checkmate this move'. I have got my king into trouble, and then disconnected the queen and the rook from the back rank.

All in all, I would have to categorise this as a series of missed opportunities, finished off with a disastrous mistake, and all because I got rattled by the Qxg7 move (which could have paved the way for me to win). If I think clearly, and avoid panicking, I can probably be a much better player.

Here are some of my Ruy Lopez games (seems everybody wants to play this if they aren't going to use the Patzer), and a stray Sicilian.

Ruy Lopez with 2 ... f6
Ruy Lopez game with 3 ... Nf6 4 O-O
Ruy Lopez Exchange Variation
Sicilian, Smith Morra Gambit

Multi Variate Testing - Online Panacea?

I've discussed multi variate testing previously - outlining the theory, the ideas, the maths and ways in which it can be done. But, in my discussions with other web analytics and optimisation professionals, it seems that MVT isn't really being used all that widely. This surprised me at first - after all, the number of tools vendors and suppliers who offer MVT is growing all the time, and I assumed from their sales material that it was the next level of A/B testing and the future of online optimisation. Additionally, it's often marketed as an online panacea, that will highlight the way forwards for your ecommerce business, and bring in double-digit growth (in whichever metric you'd care to measure).

However, out of a dozen or so online professionals that I've spoken to in EMEA, only one had tried it, and had obtained mixed results. So, why isn't it being taken up and used as widely as I'd expected? Here are some possibilities:

1. It's difficult to code
2. It's difficult to identify MVT opportunities
3. It's quicker to do an A/B test
4. It's difficult to explain to the Boss

Let's look at take a look at a simple example of MVT, which will hopefully address the first two challenges that online optimisation professionals face. I say 'simple', but it's easier than most test ideas because it concerns making some straightforward changes to a web page: taking things away.

Our content pages; our product pages; our shopping and ecommerce pages are all full of the most important content we can produce for our visitors - glossy images; descriptive text; eye-catching call-to-action buttons; all working together to produce the perfect digital shopping experience. Or perhaps they aren't. Perhaps it's a huge mish-mash of competing elements, some of which are helping, and some of which are distracting users and putting them off. So: what's working, and what isn't?

Let's take an example from maplin.co.uk - they sell a wide range of electronics and electrical items. I've selected one at random, a keyring torch. I've highlighted below various parts of the page which could be removed as part of a test (I should probably say at this point that this test will require access to the global template for product description pages - if this isn't going to work for you, read ahead to another example).

Click on the image to see a larger version.

The product page is very similar to many other ecommerce pages (similar layouts are used on various sites to sell clothes, furniture, games, toys... you name it). But what's the value of each component, and how do they work together? I've covered interactions between elements in MVT previously. The easiest way of working out the optimum combination of elements is to selectively remove them in a multi-variate test.

Here's the recipe definition for each of the various combinations that are possible:

Recipe	Reviews	Social	Tabs	Banner
A	Yes	Yes	Yes	Yes
B	Yes	Yes	Yes	No
C	Yes	Yes	No	Yes
D	Yes	Yes	No	No
E	Yes	No	Yes	Yes
F	Yes	No	Yes	No
G	Yes	No	No	Yes
H	Yes	No	No	No
I	No	Yes	Yes	Yes
J	No	Yes	Yes	No
K	No	Yes	No	Yes
L	No	Yes	No	No
M	No	No	Yes	Yes
N	No	No	Yes	No
O	No	No	No	Yes
P	No	No	No	No

Note that Recipe A is the control state (with all elements present) and Recipe P is removing everything; there are then the various combinations of the four elements in between the two. (If you're feeling mathematical, you can review how the patterns for each of the four elements changes in a binary-type way - 1000, 1001, 1010, etc. and how the table has certain symmetries). The number of recipes can be calculated by the number of options for each element (yes or no means 2 options), raised to the power of the number of elements (four elements) so 2⁴= 2 x 2 x 2 x 2 = 16 recipes.

So: sixteen recipes like this is simply not realistic for a normal A/B/C/D/n test. The traffic requirements are far too high, and you'd probably be waiting six months for results. However, because the elements are independent (you don't have to have the reviews included to have the social bar), we can carry out a multivariate test which has only a sample of these recipes, selected to ensure even coverage of the four elements, and which will (with the appropriate tools) enable you to work out the optimal combination, even if you didn't test it.
This example was on a product information page, and as I mentioned above, if you want to test here, your coders will need access to the global template file so that you can run the test across all product information pages. There are, however, single-page options that would work just as well:
- landing pages for online/offline marketing campaigns
- your home page
- checkout pages

In these cases, each page is (typically) built for a specific purpose and with specific content, so you have much more flexibility on what you can test. For example, should you have a "Chat online" option and a telephone number on your landing page, as well as an option for online feedback? Are all three really needed?

This testing has some key advantages:

1. You can test a large number of element changes on the page in one go
2. You can understand (with accurate analysis) the contribution each element makes to page performance
3. There's no new content required from the design or marketing teams - you're only handling existing content - so no reliance on them for images or content.
4. It's usually easier to remove page elements with code than it is to insert them, so your code developers will be happier
5. It's relatively easy to explain what you've tested to the Boss.
6. In this case, it's definitely quicker than A/B testing, and the more elements you choose to test, the larger the advantage becomes.

It also has some key requirements:

A. You're going to need to be able to interpret the results. This will require some careful analysis and understanding of the maths behind multi-variate testing, in order to work out what each element is contributing (in a positive or negative way). Many of the tools that are available (here's a list of some of them) offer and promise this kind of analysis, but I'm not aware of it being widely used, so it may be prudent to discuss your requirements with your account manager (I don't work for a tool provider). You don't really want to get to the end of a test and discover that you have spent eight weeks collectin a mountain of data that you can't climb... that would really require some explaining to the boss.

B. You're going to need more traffic than a typical A/B test, even if you're using a mathemetical method (such as the Taguchi method) to reduce the recipe requirements, so be prepared to wait longer than usual for your results.

I hope in this blog post I've been able to encourage you to think about using MVT, and shown you how to overcome some of the initial hurdles to getting an MVT idea together - and hopefully into execution. Please do let me know (either in the comments, or by contacting me) how your efforts go!

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - (that's this article)
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Thursday, 9 January 2014

When Good Tests Fail

Seth Godin, online usability expert recently stated simply that, 'The answer to the question, "What if I fail?" is "You will." The real question is, "What after I fail?"'

Despite rigorous analytics, careful usability studies and thoughtful designing, the results from your latest A/B test are bad. Conversion worsened; average order value plummeted and people bounced off your home page like it was a trampoline. Your test failed. And, if you're taking it personally (and most online professionals do take it very personally), then you failed too.

But, before the boss slashes your optimisation budget, you have the opportunity to rescue the test, by reviewing all the data and understanding the full picture. Your test failed - but why? I've mentioned before that tests which fail draw far more attention than those which win - it's just human nature to explore why something went wrong, and we like to attribute blame or responsibility accordingly. That's why I pull apart my Chess games to find out why I lost. I want to improve my Chess (I'm not going to stop playing, or fire myself from playing Chess).

So, the boss asks the questions- Why did your test fail? (And it's suddenly stopped being his test, or our test... it's yours). Where's the conversion uplift we expected? And why aren't profits rising?

It's time to review the test plan, the hypothesis and the key questions. Which of these apply to your test?

Answer 1. The hypothesis was not entirely valid. I have said before that, "If I eat more chocolate, I'll be able to run faster because I will have more energy." What I failed to consider is the build up of fat in my body, and that eating all that chocolate has made me heavier, and hence I'm actually running more slowly. I'm not training enough to convert all that fat into movement, and the energy is being stored as fat.

Or, in an online situation: the idea was proved incorrect. Somewhere, one of the assumptions that was made was wrong. This is where the key test questions come in. The analysis that comes from answering these key questions will help retrieve your test from 'total failure' to 'learning experience'.

Sometimes, in an online context, the change we made in the test had an unforeseen side-effect. We thought we were driving more people from the product pages to the cart, but they just weren't properly prepared. We had the button at the bottom of the page, and people who scrolled to the bottom of the page saw the full specs of the new super-toaster and how it needs an extra battery-pack for super-toasting. We moved the button up the page, more people clicked on it, but realised only at the cart page that it needed the additional battery pack. We upset more people than we helped, and overall conversion went down.

Unforeseen side-effects in testing leading to adverse performance:
too much chocolate slows down 100m run times due to increased body mass

Answer 2. The visual design of the test recipe didn't address the test hypothesis or the key test questions. In any lab-based scientific experiment, you would expect to set up the apparatus and equipment and take specific measurements based on the experiment you were doing. You would also set up the equipment to address the hypothesis - otherwise you're just messing about with lab equipment. For example, if you wanted to measure the force of gravity and how it affects moving objects, you wouldn't design an experiment with a battery, a thermometer and a microphone.

However, in an online environment, this sort of situation becomes possible, because different people possess the skills required to analyse data and the skills to design banners etc, and the skills to write the HTML or JavaScript code. The analyst, the designer and the developer need to work closely together to make sure that the test design which hits the screen is going to answer the original hypothesis, and not something else that the designer believes will 'look nice' or that the developer finds easier to code. Good collaboration between the key partners in the testing process is essential - if the original test idea doesn't meet brand guidelines, or is extremely difficult to code, then it's better to get everybody together and decide what can be done that will still help prove or disprove the hypothesis.

To give a final example from my chocolate-eating context, I wouldn't expect to prove that chocolate makes me run faster by eating crisps (potato chips) instead. Unless they were chocolate-coated crips? Seriously.

Answer 3. Sometimes, the test design and execution was perfect, and we measured the right metrics in the right way. However, the test data shows that our hypothesis was completely wrong. It's time to learn something new...!

My hypothesis said that chocolate would make me run faster; but it didn't. Now, I apologise that I'm not a biology expert and this probably isn't correct, but let's assume it is, review the 'data' and find out why.

For a start, I put on weight (because chocolate contains fat), but worse still, the sugar in chocolate was also converted to fat, and it wasn't converted back into sugar quickly enough for me to benefit from it while running the 100 metres. Measurements of my speed show I got slower, and measurements of my blood sugar levels before and after the 100 metres showed that the blood sugar levels fell, because the fat in my body wasn't converted into glucose and transferred to my muscles quickly enough. Additionally, my body mass rose 3% during the testing period, and further analysis showed this was fat, not muscle. This increased mass also slowed me down.

Back to online: you thought people would like it if your product pages looked more like Apple's. But Apple sell a limited range of products - one phone, one MP3 player, one desktop PC, etc. while you sell 15-20 of each of those, and your test recipe showed only one of your products on the page (the rest were hidden behind a 'View More' link), when you get better financial performance from a range of products. Or perhaps you thought that prompting users to chat online would help them go through checkout... but you irritated them and put them off. Perhaps your data showed that people kept leaving your site to talk to you on the phone. However, when you tested hiding the phone number, in order to get people to convert online, you found that sales through the phone line went down, as expected, but your online sales also fell because people were using the phone line for help completing the online purchase. There are learnings in all cases that you can use to improve your site further - you didn't fail, you just didn't win ;-)

In conclusion Yes, sometimes test recipes lose. Hypotheses were incorrect, assumptions were invalid, side-effects were missed and sometimes the test just didn't ask the question it was meant to. The difference between a test losing and a test failing is in the analysis, and that comes from planning - having a good hypothesis in the first place, and asking the right questions up front which will show why the test lost (or, let's not forget, the reason why a different test won). Until then, fail fast and learn quickly!

Header tag