Web Optimisation, Maths and Puzzles

Monday, 3 March 2014

Chess: Ruy Lopez Exchange Variation

Some games are classics. Some are so disastrously filled with blunders that the only way you're going to win is by committing fewer blunders and, ideally, not be the last person to commit one.

This is one of those games. It started off well enough, with the Ruy Lopez Exchange Variation (I knew a few moves, which got me started), but then the game descended into a number of unusual moves. Or blunders. And I missed at least one key opportunity to secure a big win... again.

Here goes.

Dave Johnson vs David Leese, Kidsgrove Chess Club, Roy Bennett Cup. 25 Feb 2014

1. e4 e5

2. Nf3 Nc6

3. Bb4 a6

4. Bxc6 dxc6

So far, soo good. All by-the-book. I've had to recapture with my d-pawn (away from the centre) to ensure I don't lose the e-pawn (I can meet Nxe5 with Qd4, forking pawn and knight and subsequently regaining the pawn).

5. d3 Bd6 (White plays to protect his e-pawn, so the threat of Nxe5 is back on. I protect my e-pawn with my bishop, which seems okay to me.)

6. O-O Bg4

7. d4 exd4 (White's move surprised me. What's he doing, moving a pawn twice during development? I've pinned his knight on f3, so I am eyeing up the possibility of doubling his f- and g-pawns with an exchange).
8. Qxd4 Bxf3 (... so far, so good, it seems, all going according to my plan).

9. Qxg7 Qh4 (I missed Qxg7. But I've decided I'm not playing cautiously, and if I can't save my rook, I'm going to get some counterplay out of it. Here, I'm threatening Qxh2#, and I've decided that this game is not going to be a draw).

At this stage, I'm thinking of various threats, apart from Qxh2#. I'm anticipating various defences, including 10. h3, where I'll then play Bxg2 and possibly Qxe4. I really need to think ahead (I was playing on adrenaline, which is never a good idea) and I didn't see White's defence (which was also a mistake).

10. e5? Be2?

An error from White, followed by a massive, massive mistake by me. After 10. e5, the best answer was Bf8, threatening all kinds of nastiness. Here's the position after 10. ... Bf8, the best move for Black.

Note that Black is currently a piece up (White has not recaptured on f3, as he hasn't had time and opted to capture on g7 instead). And his Queen is attacked, so he can't recapture on f3 just yet either. I wish I'd seen this move at the time - I was too busy working out how to save my pieces, and this was the obvious answer: if I can't threaten the King, then chase the Queen. Here, White can't capture the rook, because Qxg8 falls to ... Qg4, and after g3, Qh3 has no reply. White's only answer here is Qg5, and after an exchange of queens and Black's Bd5 or Be4, Black is a piece up for a pawn (although his pawn structure is a mess).

But no, I played Be2. I moved my bishop away from its prime location in front of White's king, and attacked the rook instead... it's become a desperado. In fact, both bishops have ... what am I thinking?

11. exd6 Bxf1

12. Qxh8 O-O-O

I have a plan here, working around Kxf1, Qc4+, and then either Re8+ or Qxc2 with various threats. However, White isn't bothered with the Bishop at the moment. I should probably have played Qg4 to press the issue (threatening Qxg2#) and force the capture. Did I see this? No.

13. Qe5 Rxd6 (here comes the rook...)

14. Nc3 Rf6? (White plays Nc3, developing the knight and denying my rook the d1 square. I decide, after some thought, to threaten checkmate - the threat is Qxf2 and the Qxg2#).

15. Qe8#

If there's only one thing quicker than 'checkmate next move', it's 'checkmate this move'. I have got my king into trouble, and then disconnected the queen and the rook from the back rank.

All in all, I would have to categorise this as a series of missed opportunities, finished off with a disastrous mistake, and all because I got rattled by the Qxg7 move (which could have paved the way for me to win). If I think clearly, and avoid panicking, I can probably be a much better player.

Here are some of my Ruy Lopez games (seems everybody wants to play this if they aren't going to use the Patzer), and a stray Sicilian.

Ruy Lopez with 2 ... f6
Ruy Lopez game with 3 ... Nf6 4 O-O
Ruy Lopez Exchange Variation
Sicilian, Smith Morra Gambit

Multi Variate Testing - Online Panacea?

I've discussed multi variate testing previously - outlining the theory, the ideas, the maths and ways in which it can be done. But, in my discussions with other web analytics and optimisation professionals, it seems that MVT isn't really being used all that widely. This surprised me at first - after all, the number of tools vendors and suppliers who offer MVT is growing all the time, and I assumed from their sales material that it was the next level of A/B testing and the future of online optimisation. Additionally, it's often marketed as an online panacea, that will highlight the way forwards for your ecommerce business, and bring in double-digit growth (in whichever metric you'd care to measure).

However, out of a dozen or so online professionals that I've spoken to in EMEA, only one had tried it, and had obtained mixed results. So, why isn't it being taken up and used as widely as I'd expected? Here are some possibilities:

1. It's difficult to code
2. It's difficult to identify MVT opportunities
3. It's quicker to do an A/B test
4. It's difficult to explain to the Boss

Let's look at take a look at a simple example of MVT, which will hopefully address the first two challenges that online optimisation professionals face. I say 'simple', but it's easier than most test ideas because it concerns making some straightforward changes to a web page: taking things away.

Our content pages; our product pages; our shopping and ecommerce pages are all full of the most important content we can produce for our visitors - glossy images; descriptive text; eye-catching call-to-action buttons; all working together to produce the perfect digital shopping experience. Or perhaps they aren't. Perhaps it's a huge mish-mash of competing elements, some of which are helping, and some of which are distracting users and putting them off. So: what's working, and what isn't?

Let's take an example from maplin.co.uk - they sell a wide range of electronics and electrical items. I've selected one at random, a keyring torch. I've highlighted below various parts of the page which could be removed as part of a test (I should probably say at this point that this test will require access to the global template for product description pages - if this isn't going to work for you, read ahead to another example).

Click on the image to see a larger version.

The product page is very similar to many other ecommerce pages (similar layouts are used on various sites to sell clothes, furniture, games, toys... you name it). But what's the value of each component, and how do they work together? I've covered interactions between elements in MVT previously. The easiest way of working out the optimum combination of elements is to selectively remove them in a multi-variate test.

Here's the recipe definition for each of the various combinations that are possible:

Recipe	Reviews	Social	Tabs	Banner
A	Yes	Yes	Yes	Yes
B	Yes	Yes	Yes	No
C	Yes	Yes	No	Yes
D	Yes	Yes	No	No
E	Yes	No	Yes	Yes
F	Yes	No	Yes	No
G	Yes	No	No	Yes
H	Yes	No	No	No
I	No	Yes	Yes	Yes
J	No	Yes	Yes	No
K	No	Yes	No	Yes
L	No	Yes	No	No
M	No	No	Yes	Yes
N	No	No	Yes	No
O	No	No	No	Yes
P	No	No	No	No

Note that Recipe A is the control state (with all elements present) and Recipe P is removing everything; there are then the various combinations of the four elements in between the two. (If you're feeling mathematical, you can review how the patterns for each of the four elements changes in a binary-type way - 1000, 1001, 1010, etc. and how the table has certain symmetries). The number of recipes can be calculated by the number of options for each element (yes or no means 2 options), raised to the power of the number of elements (four elements) so 2⁴= 2 x 2 x 2 x 2 = 16 recipes.

So: sixteen recipes like this is simply not realistic for a normal A/B/C/D/n test. The traffic requirements are far too high, and you'd probably be waiting six months for results. However, because the elements are independent (you don't have to have the reviews included to have the social bar), we can carry out a multivariate test which has only a sample of these recipes, selected to ensure even coverage of the four elements, and which will (with the appropriate tools) enable you to work out the optimal combination, even if you didn't test it.
This example was on a product information page, and as I mentioned above, if you want to test here, your coders will need access to the global template file so that you can run the test across all product information pages. There are, however, single-page options that would work just as well:
- landing pages for online/offline marketing campaigns
- your home page
- checkout pages

In these cases, each page is (typically) built for a specific purpose and with specific content, so you have much more flexibility on what you can test. For example, should you have a "Chat online" option and a telephone number on your landing page, as well as an option for online feedback? Are all three really needed?

This testing has some key advantages:

1. You can test a large number of element changes on the page in one go
2. You can understand (with accurate analysis) the contribution each element makes to page performance
3. There's no new content required from the design or marketing teams - you're only handling existing content - so no reliance on them for images or content.
4. It's usually easier to remove page elements with code than it is to insert them, so your code developers will be happier
5. It's relatively easy to explain what you've tested to the Boss.
6. In this case, it's definitely quicker than A/B testing, and the more elements you choose to test, the larger the advantage becomes.

It also has some key requirements:

A. You're going to need to be able to interpret the results. This will require some careful analysis and understanding of the maths behind multi-variate testing, in order to work out what each element is contributing (in a positive or negative way). Many of the tools that are available (here's a list of some of them) offer and promise this kind of analysis, but I'm not aware of it being widely used, so it may be prudent to discuss your requirements with your account manager (I don't work for a tool provider). You don't really want to get to the end of a test and discover that you have spent eight weeks collectin a mountain of data that you can't climb... that would really require some explaining to the boss.

B. You're going to need more traffic than a typical A/B test, even if you're using a mathemetical method (such as the Taguchi method) to reduce the recipe requirements, so be prepared to wait longer than usual for your results.

I hope in this blog post I've been able to encourage you to think about using MVT, and shown you how to overcome some of the initial hurdles to getting an MVT idea together - and hopefully into execution. Please do let me know (either in the comments, or by contacting me) how your efforts go!

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - (that's this article)
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Thursday, 9 January 2014

When Good Tests Fail

Seth Godin, online usability expert recently stated simply that, 'The answer to the question, "What if I fail?" is "You will." The real question is, "What after I fail?"'

Despite rigorous analytics, careful usability studies and thoughtful designing, the results from your latest A/B test are bad. Conversion worsened; average order value plummeted and people bounced off your home page like it was a trampoline. Your test failed. And, if you're taking it personally (and most online professionals do take it very personally), then you failed too.

But, before the boss slashes your optimisation budget, you have the opportunity to rescue the test, by reviewing all the data and understanding the full picture. Your test failed - but why? I've mentioned before that tests which fail draw far more attention than those which win - it's just human nature to explore why something went wrong, and we like to attribute blame or responsibility accordingly. That's why I pull apart my Chess games to find out why I lost. I want to improve my Chess (I'm not going to stop playing, or fire myself from playing Chess).

So, the boss asks the questions- Why did your test fail? (And it's suddenly stopped being his test, or our test... it's yours). Where's the conversion uplift we expected? And why aren't profits rising?

It's time to review the test plan, the hypothesis and the key questions. Which of these apply to your test?

Answer 1. The hypothesis was not entirely valid. I have said before that, "If I eat more chocolate, I'll be able to run faster because I will have more energy." What I failed to consider is the build up of fat in my body, and that eating all that chocolate has made me heavier, and hence I'm actually running more slowly. I'm not training enough to convert all that fat into movement, and the energy is being stored as fat.

Or, in an online situation: the idea was proved incorrect. Somewhere, one of the assumptions that was made was wrong. This is where the key test questions come in. The analysis that comes from answering these key questions will help retrieve your test from 'total failure' to 'learning experience'.

Sometimes, in an online context, the change we made in the test had an unforeseen side-effect. We thought we were driving more people from the product pages to the cart, but they just weren't properly prepared. We had the button at the bottom of the page, and people who scrolled to the bottom of the page saw the full specs of the new super-toaster and how it needs an extra battery-pack for super-toasting. We moved the button up the page, more people clicked on it, but realised only at the cart page that it needed the additional battery pack. We upset more people than we helped, and overall conversion went down.

Unforeseen side-effects in testing leading to adverse performance:
too much chocolate slows down 100m run times due to increased body mass

Answer 2. The visual design of the test recipe didn't address the test hypothesis or the key test questions. In any lab-based scientific experiment, you would expect to set up the apparatus and equipment and take specific measurements based on the experiment you were doing. You would also set up the equipment to address the hypothesis - otherwise you're just messing about with lab equipment. For example, if you wanted to measure the force of gravity and how it affects moving objects, you wouldn't design an experiment with a battery, a thermometer and a microphone.

However, in an online environment, this sort of situation becomes possible, because different people possess the skills required to analyse data and the skills to design banners etc, and the skills to write the HTML or JavaScript code. The analyst, the designer and the developer need to work closely together to make sure that the test design which hits the screen is going to answer the original hypothesis, and not something else that the designer believes will 'look nice' or that the developer finds easier to code. Good collaboration between the key partners in the testing process is essential - if the original test idea doesn't meet brand guidelines, or is extremely difficult to code, then it's better to get everybody together and decide what can be done that will still help prove or disprove the hypothesis.

To give a final example from my chocolate-eating context, I wouldn't expect to prove that chocolate makes me run faster by eating crisps (potato chips) instead. Unless they were chocolate-coated crips? Seriously.

Answer 3. Sometimes, the test design and execution was perfect, and we measured the right metrics in the right way. However, the test data shows that our hypothesis was completely wrong. It's time to learn something new...!

My hypothesis said that chocolate would make me run faster; but it didn't. Now, I apologise that I'm not a biology expert and this probably isn't correct, but let's assume it is, review the 'data' and find out why.

For a start, I put on weight (because chocolate contains fat), but worse still, the sugar in chocolate was also converted to fat, and it wasn't converted back into sugar quickly enough for me to benefit from it while running the 100 metres. Measurements of my speed show I got slower, and measurements of my blood sugar levels before and after the 100 metres showed that the blood sugar levels fell, because the fat in my body wasn't converted into glucose and transferred to my muscles quickly enough. Additionally, my body mass rose 3% during the testing period, and further analysis showed this was fat, not muscle. This increased mass also slowed me down.

Back to online: you thought people would like it if your product pages looked more like Apple's. But Apple sell a limited range of products - one phone, one MP3 player, one desktop PC, etc. while you sell 15-20 of each of those, and your test recipe showed only one of your products on the page (the rest were hidden behind a 'View More' link), when you get better financial performance from a range of products. Or perhaps you thought that prompting users to chat online would help them go through checkout... but you irritated them and put them off. Perhaps your data showed that people kept leaving your site to talk to you on the phone. However, when you tested hiding the phone number, in order to get people to convert online, you found that sales through the phone line went down, as expected, but your online sales also fell because people were using the phone line for help completing the online purchase. There are learnings in all cases that you can use to improve your site further - you didn't fail, you just didn't win ;-)

In conclusion Yes, sometimes test recipes lose. Hypotheses were incorrect, assumptions were invalid, side-effects were missed and sometimes the test just didn't ask the question it was meant to. The difference between a test losing and a test failing is in the analysis, and that comes from planning - having a good hypothesis in the first place, and asking the right questions up front which will show why the test lost (or, let's not forget, the reason why a different test won). Until then, fail fast and learn quickly!

Tuesday, 7 January 2014

The Key Questions in Online Testing

As you begin the process of designing an online test, the first thing you'll need is a solid test hypothesis. My previous post outlined this, looking at a hypothesis, HIPPOthesis and hippiethesis. To start with a quick recap, I explained that a good hypothesis says something like, "IF we make this change to our website, THEN we expect to see this improvement in performance BECAUSE we will have made it easier for visitors to complete their task." Often, we have a good idea about what the test should be - make something bigger, have text in red instead of black... whatever.

Stating the hypothesis in a formal way will help to draw the ideas together and give the test a clear purpose. The exact details of the changes you're making in the test, the performance change you expect, and the reasons for the expected changes will be specific to each test, and that's where your web analytics data or usability studies will support your test idea. For example, if you're seeing a large drop in traffic between the cart page and the checkout pages, and your usability study shows people aren't finding the 'continue' button, then your hypothesis will reflect this.

In between the test hypothesis and the test execution are the key questions. These are the key questions that you will develop from your hypothesis, and which the test should answer. They should tie very closely to the hypothesis, and they will direct the analysis of your test data, otherwise you'll have test data that will lack a focus and you'll struggle to tell the story of the test. Think about what your test should show - what you'd like it to prove - and what you actually want to answer, in plain English.

Let's take my offline example from my previous post. Here's my hypothesis: "If I eat more chocolate, then I will be able to run faster because I will have more energy."

It's good - but only as a hypothesis (I'm not saying it's true, or accurate, but that's why we test!). But before I start eating chocolate and then running, I need to confirm the exact details of how much chocolate, what distance and what times I can achieve at the moment. If this was an ideal offline test, there would be two of me, one eating the chocolate, and one not. And if it was ideal, I'd be the one eating the chocolate :-)

So, the key questions will start to drive the specifics of the test and the analysis. In this case, the first key question is this: "If I eat an additional 200 grams of chocolate each day, what will happen to my time for running the 100 metres sprint?"

It may be 200 grams or 300 grams; the 100m or the 200m, but in this case I've specified the mass of chocolate and the distance. Demonstrating the 'will have more energy' will be a little harder to do. In order to do this, I might add further questions, to help understand exactly what's happening during the test - perhaps questions around blood sugar levels, body mass, fat content, and so on. Note at this stage that I haven't finalised the exact details - where I'll run the 100 metres, what form the chocolate will take (Snickers? Oreos? Mars?), and so on. I could specify this information at this stage if I needed to, or I could write up a specific test execution plan as the next section of my test document.

In the online world I almost certainly will be looking at additional metrics - online measurements are rarely as straightforward as offline. So let's take an online example and look at it in more detail.

"If I move the call-to-action button on the cart page to a position above the fold, then I will drive more people to start the checkout process because more people will see it and click on it."

And the key questions for my online test?

"How is the click-through rate for the CTA button affected by moving it above the fold?"
"How is overall cart-to-complete conversion affected by moving the button?"
"How are these two metrics affected if the button is near the top of the page or just above the fold?"

As you can see, the key questions specify exactly what's being changed - maybe not to the exact pixel, but they provide clear direction for the test execution. They also make it clear what should be measured - in this case, there are two conversion rates (one at page level, one at visit level). This is perhaps the key benefit of asking these core questions: they drive you to the key metrics for the test.

"Yes, but we want to measure revenue and sales for our test."

Why? Is your test meant to improve revenue and sales? Or are you looking to reduce bounce rate on a landing page, or improve the consumption of learn content (whitepapers, articles, user reviews etc) on your site? Of course, your site's reason-for-being is to general sales and revenue. Your test data may show a knock-on improvement on revenue and sales, and yes, you'll want to make sure that these vital site-wide metrics don't fall off a cliff while you're testing, but if your hypothesis says, "This change should improve home page bounce rate because..." then I propose that it makes sense to measure bounce rate as the primary metric for the test success. I also suspect that you can quickly tie bounce rate to a financial metric through some web analytics - after all, I doubt that anyone would think of trying to improve bounce rate without some view of how much a successful visitor generates.

So: having written a valid hypothesis which is backed by analysis, usability or other data (and not just a go-test-this mentality from the boss), you are now ready to address the critical questions for the test. These will typically be, "How much....?" and "How does XYZ change when...?" questions that will focus the analysis of the test results, and will also lead you very quickly to the key metrics for the test (which may or may not be money-related).

I am not proposing to pack away an extra 100 grams of chocolate per day and start running the 100 metres. It's rained here every day since Christmas and I'm really not that dedicated to running. I might, instead, start on an extra 100 grams of chocolate and measure my body mass, blood cholesterol and fat content. All in the name of science, you understand. :-)

Monday, 6 January 2014

Chess: King's Gambit 1. e4 e5 2. f4

After my most recent post, where I played as Black against the Bird Opening 1. f4, in this post I'd like to cover another game where I again faced White playing an f4 opening - in this case, the King's Gambit. I played this against one of my Kidsgrove Chess Club team mates, and this time, I lost. Badly. I had just suffered a difficult 32-move defeat against the team's top player, and I will make the excuse that I wasn't playing my best here. I will cover that game in a later blog post.

I'm covering this game here, because interestingly, in the previous game where I played against 1. f4, I won by playing Qh4+ early on in the game. If I'd been more aware, I might have seen it here, too.

David Johnson vs Dave Leese, 17 December 2013 Kidsgrove Chess Club (Friendly)

1.e4 e5
2.f4 Nc6
3.fxe5

It's unusual to have a critical point in a game so early, but here it is. I played the natural recapture with Nxe5 and slowly got into all sorts of trouble. I missed the Qh4+ move that I played (and won with) just two weeks earlier.

3. ... Nxe5 ( 3...Qh4+ )

Let's look at 3. ... Qh4+ before reviewing the actual game in full. There are two replies - to move the King to e2 or to block with a Pawn on g3.

3. ... Qh4+
4. Ke2 Qxe4+
5. Kf2 Bc5+

If then 6. Kg3, then Chessmaster 9000 (my preferred analysis tool) is already giving a mate in 8, starting with 6. ... h5. The other option is 6. d4 and this is going to mean a significant loss of material - 6 ... Bxd4+ 7. Qxd4 Qxd4+ and White has delayed the inevitable at the cost of his Queen.

However, I completely missed this overwhelming attack, and instead went through a painful game where I fell into all sorts of trouble. Let's resume after 3. ... Nxe5

4.d4 Ng6 (I could still have played ... Qh4+, or ... Bb4+ at this point).
5.Nf3 d6 (there are no more chances of ... Qh4+ now, and Chessmaster recommends ... d5).
6.Bc4 h6 (preventing Ng5 and an attack on f7 - in theory, anyway).
7.O-O Bg4? (a big mistake, as we shall see. Better was Nf6 or Be6... something - anything - to protect f7. White's decision to castle was not just a natural move at this stage, it moved the Rook onto the semi-open and dangerous f-file).

The position after 7. ... Bg4 and immediately before Bxf7+!

8.Bxf7+! Ke7 ( not 8...Kxf7 which leads to Ng5++ and Nf7 forking Queen and Rook)
9.Bxg6 Nf6 (finally!)
10.Nc3 c6 (opening a diagonal for my Queen, and providing space for my King)
11.Qe1 Kd7 (taking advantage of the slow pace of the game to improve my King and Queen)
12.e5 Nd5
13.Nxd5 cxd5

White played Qg3? and missed exd6 with a large material gain.

14.Qg3? Be6 (my wayward Bishop finally gets a decent square, even if the Pawn on f7 is gone)
15.Nh4 Qb6 (developing, and attacking the newly-unprotected d4 Pawn)
16.c3 Be7 (perhaps ... dxe5 was better)
17.Nf5 Rhf8
18.Nxe7 Rxf1+
19.Kxf1 Rf8+ (No, I'm not sure why I threw this in. I needed as many pieces as possible on the board, but at least I got rid of White's active Rook in exchange for my inactive one).
20.Kg1 Kxe7
21.exd6+ Kd7
22.Qe5 Qxd6 (offering an exchange, but also protecting the Rook on f8).
23.Qxg7+ Kc6
24.Qe5 Qxe5
25.dxe5 Rg8
( 25...Bh3 26.gxh3 Rg8 27.Bf4 Rxg6+ )
26.Bh5 Bh3
27.Bf3 Bf5
28.Bxh6 1-0

The final position. I've run out of ideas, I'm a piece and three pawns down, and I've had enough! Seeing afterwards that I missed several opportunities for a massive attack in the first few moves, has made me more confident in my attacking options, and how I missed my opponent's attack developing (in this game and the previous one) has made me even more aware of the need to defend accurately too. Yes I put up a fight, but really I was defending a lost cause due to some daft blunders. On with the next game!

Header tag