Web Optimisation, Maths and Puzzles: online

Showing posts with label online. Show all posts

Wednesday, 27 August 2025

Do You Know How Well Your Test Will Perform?

There are various ways of running tests - or more specifically, there are various ways of generating test hypotheses. One that I've come across over the years, and more recently, is the 'guess how well your test is going to perform' approach. It's not called that, but it seems to me to be the most succinct description.

"If we change the pictures on our site from cats to dogs, then we'll see a 3.5% increase in conversion."
"If we promote construction toys ahead of action figures, then we'll see a 4% lift in revenue."

If you know that's going to happen, why don't you do it anyway?

The main underlying challenge I have is that it's almost impossible to quantify the improvement you're going to get. How do you know?

Well, let's attempt the calculation (with hypothetical numbers all the way through).

Let's say our latest campaign landing page has a bounce rate (user lands on page, then exits without visiting any other pages) of 75%. 10% engage with site search, 10% click on the menus at the top of the page, and 5% click on the content on the page (there are a few banners and a few links).

We've identified that most users aren't scrolling past the first set of banners and links, and we therefore hypothesise that if we make the banners smaller, and reduce the amount of padding around the links, that we can increase engagement with the content in the lower half of the page, and therefore improve the bounce rate. We believe we can get 50% more links above the fold, and therefore increase the in-page engagement rate from 5% to 7.5%. We will assume (and this is the fun bit) that this additional traffic converts at the same rate as the 5% we have so far, and therefore, we'll get a revenue lift of 50%. This sounds like a lot, but given that the engagement rate is going up from a small number to a slightly larger number, it's unlikely to be a huge revenue lift in dollar terms (unless you're pouring in huge volumes of traffic - and watching it bounce at a rate of 75%).

Perhaps that was an over-simplification. But if we knew that our test will give us a 5% lift (and we've still decided to test it), what happens when we launch the test? Presumably, we'll stop it when it reaches the 5% lift, irrespective of the confidence level. But what happens if it doesn't get to 5%? What if it stubbornly sits at 4%? Or maybe just 3%? Did the test win, or did it lose? In classical scientific terms, it lost, since we disproved our overly-specific hypothesis. But from a business perspective, it still won, just not by as much as we had originally expected. Would you go into a meeting with the marketing manager and say, "Sorry, Jim, our test only achieved a 3% revenue lift, so we've decided it was a failure."?

For me, it comes down to two arguments:

If you can forecast your test result with a high degree of certainty, based on considerable evidence for your hypothesis, it's probably not worth testing and you should implement already. Testing is best used for edge-cases with some degree of uncertainty.

If, on the other hand, you have identified a customer problem with your site, and you can see that fixing it will give you a revenue lift - but you don't know how to fix it - then that's very good grounds for testing. The hypothesis is not, "If we fix this problem, we'll get a 6% revenue lift," but, "If we fix this problem in this way then we'll get a revenue lift". And that's where you need to encourage the website analysts and the customer feedback department (or the complaints department, or whoever advocates for customers within your company) to come together and find out where the problems are, and what they are, and how to address them.

That will undoubtedly bring good test ideas, and that's what you're looking for, even if you don't know how much revenue lift it will provide.

Friday, 21 September 2018

Email Etiquette

I'm going to go completely off-topic in this post, and talk about something that I've started noticing more and more over recent months: poor email etiquette. Not poor spelling, or grammar, or style, but just a low standard of communication from people and businesses who send me emails. Things like missing images, poor titles, wonky meta tags, and pre-header text (the part of an email that you see in your browser after the subject title). This is all stuff that can be accepted, ignored or overlooked - it's fine. But sometimes the content of the email - the writing style or lack of it - begins to speak more loudly than the text in it.

Way back in the annals of online history, internet etiquette ("netiquette") was a buzz-word that was bandied around chat rooms, HTML web pages, and the occasional online guide.

According to the BBC, netiquette means, "Respecting other users' views and displaying common courtesy when posting your views to online discussion groups." while Wikipedia defines it as, "is a set of social conventions that facilitate interaction over networks, ranging from Usenet and mailing lists to blogs and forums." Which is fair enough. In short, netiquette means "Play nicely!"

Email etiquette is something else - similar, but different. Email is personal, while online posting is impersonal and has a much wider audience. Email is, to all intents and purposes, the modern version of writing a letter, and we were all taught how to write a letter, right? No? Except that the speed of email means that much of the thought and care that goes into writing a letter (or even word-processing one) has also started to disappear. Here, then are my suggestions for good email etiquette.

- Check your typing. You might be banging out a 30-second email, but it's still worth taking an extra five seconds to check that everything is spelt correctly. "It is not time to launch the product" and "It is now time to launch the product" will both beat a spell-checker, but only one of them is what you meant to say.

- Use the active tense instead of the passive. Saying "I understand," or "I agree" just reads better and conveys more information than "Understood." or "Agreed." You're not a robot, and you don't have to lose your personality to communicate effectively via email.

- Write in complete sentences. Just because you're typing as fast as you think doesn't mean that your recipients will read the incomplete sentences you've written and correctly extrapolate them back to your original thoughts. The speed of email delivery does not require speedier responses. Take your time. If you start dropping I, you, me, then, that, if and other important nouns and pronouns from your sentences, and replacing them with full stops, then you're going to confuse a lot of people. This ties in with the previous point - just because the passive tense is shorter than the active doesn't mean that it will be easier to understand. You will also irritate those who are having to increase their effort in order to understand you.

"Take your time..."

- Don't use red text, unless you know what you're doing. Red text says "This is an error", which is fine if you're highlighting an error, but will otherwise frustrate and irritate your readers. Full capitals is still regarded as shouting (although have you ever noticed that comic book characters shout in almost all their speech bubbles?), which is okay if you want to shout, but not recommended if you want to improve the readability of your message.

- Shorter sentences are better than long ones. Obviously, your sentences still need to be complete, but this suggestion applies especially if your readers don't read English as their first language. Break up your longer sentences into shorter ones. Keep the language concise. Split your sentences instead of carrying on with an "and...". You're not writing a novel, you're writing a message, so you can probably lose subordinate clauses, unnecessary adverbs and parenthetical statements. Keep it concise, keep it precise. This also applies to reports, analyses and recommendations. Stick to the point, and state it clearly.

"Keep it concise, keep it precise."

- Cool fingers on a calm keyboard. If you have to reply to an email which has annoyed, irritated or frustrated you, then go away and think about your reply for a few minutes. Keep calm instead of flying off the handle and hammering your keyboard. Pick out the key points that need to be addressed, and handle them in a cool, calm and factual manner. "Yes, my idea is better than yours, and no, I don't agree with your statements, because..." is going to work better in the long term than lots of red text and block capitals.

- Remember that sarcasm and irony will be almost completely lost by the time your message reaches its recipient(s). If you're aiming to be sarcastic or ironic, then you'd better be very good at it, or dose it with plenty of smileys or emoticons to help get the message across. Make use of extra punctuation, go for italics and capital letters, and try not to be too subtle. If in doubt, or if you're communicating with somebody who doesn't know you very well, then avoid sarcasm completely. Sometimes, this can even apply over the phone, too. Subtlety can be totally lost over a phone conversation, so work out what you want to say, and say it clearly.  Obviously!

- Please and thank you go a long, long way. If you want to avoid sounding heavy handed and rude, then use basic manners. If you're making a request, then say please. If you're acknowledging somebody's work, then say thank you. You'll be amazed at how this improves working relationships with everybody around you - a little appreciation goes a long way. I know this is hardly earth-shattering, nor specific to email, but it's worth repeating.

- When you've finished, stop.  Don't start wandering around the discussion, bringing up new subjects or changing topic. Start another email instead.

FOR EXAMPLE

A potential worst case? You could start (and potentially end) an email with "Disagree."

Monday, 30 November 2015

Don't Make Me Think = Don't Make Me Read?

We all know (because we've all been told) that online selling must follow the eternal (pre 1990) mantra that 'less is more'. Less clutter, less text and less mess means less confusion equals more clarity equals more sales. Yet in the offline world, shop shelves are stacked with 23 different brands of toothpaste, 32 different types of shampoo, and aisles and aisles of goods. And if you think the bricks-and-mortar model is dead, and you'd prefer a modern comparison: why are Amazon's warehouse shelves so full of so many different types of identical products? And why do they have so many web pages?

Ignoring that argument, we know less is more because if there's "more" then people will have to think about the product they're buying. And we know thinking is bad because there's a book about online usability called "Don't make me think," and in this era of online publishing and blogging, you must surely be a respected authority on a subject if you can get a book published. We may not have read your book, but we've heard of it, and we know the title. [Steve Krug is a respected authority, and was even before he wrote his book].

But could you imagine if this less-is-more attitude was taken a step further in the offline world?

Imagine going to a bricks-and-mortar store, reaching the checkout with your selection, and then making eye contact with the checkout assistant. He (or she) looks away, grabs your shopping off you; scans the barcodes and runs up the total, then looks at you. You look back. He points to the total displayed on the cash register. You reach for your credit card. He waves his hand towards the card machine.

You put your card in the machine. He does whatever it is they do on the cash register to make the magical words “Enter your PIN” appear. There’s brief delay, the words “Transaction approved” appear in black on green on the dot-matrix screen, the assistant hands you your receipt and without any further delay turns to face the next customer.

Delightful, isn’t it? After all, you bought your items and paid for it successfully, didn’t you? And did you have to think? Perhaps you would have preferred to use one of those self-service checkouts, complete with 'unexpected item in the bagging area'.

This is the epitome of decluttered pages, where removing anything and everything that isn’t part of the main purchase experience is the absolute aim and surely conversion improvements will follow. As online analysts, we've trained our managers and superiors to think that less is more, that clutter is bad and that we should just get people to press the button and buy the stuff. However, I believe that kind of thinking is out-of-date and needs to be revisited.

Our visitors are intelligent people. They are not all high-speed 'I'd buy this even quicker if you got out of my way' purchasers. Some of them actually want to read about your product - its unique selling point, its benefits, why it's not the cheapest product you offer. They want to view the specifications (Is it waterproof? Is it dishwasher safe?) and they won't buy the product unless you reassure them that it's the one they want. We can reduce our product specifications to icons (yes, it's ultra secure, and it's got central locking and built-in satnav) but if your icon isn't either well-known or intuitive, then you made things worse by removing the words "with a built-in satnav" and replacing them with a picture of ... well... a wifi point? Have we have started to think (subconsciously) that we should just show glossy pictures of our products and that this would be enough?

Is text bad? And is more text worse? I don't think so - after all, you're reading this, and despite my best efforts, it's pretty text-heavy with only a couple of images thrown in to break up all these words. I'm making some pretty heavy demands on you (to read this much text, paragraphed but largely unformatted), but you've stuck with me this far. I know online selling is different from online reading, but if you've managed to read this much, then it's fair to assume you can read some sales copy on a web page for a product that you are genuinely interested in.

So, is less more? I am calling for a more balanced approach towards website content. We need to understand that it isn't simply about "less" and "more". It's about the right content in the right place, at the right time to support a visitor's intentions (which may be 'to buy the product' or may be 'to find the best one for me'.) And having said this much, I'll stop.

Wednesday, 1 July 2015

Online Optimisation is a Game of Chess

It occurred to me recently that there are some similarities between playing a game of Chess and optimising a website through testing (apart from the considerable differences between them).

Many years ago, when I started this blog, I envisaged it as a place where I would type up my chess games, with commentary, lessons and points for improvement. A quick glance through my archive of blog posts will quickly show that the blog really hasn't matched its original plan. It evolved and changed, in particular in 2011, when I shared my first few posts about web analytics and I realised that I could attract significantly more readers by blogging and sharing about web analytics than I have ever managed with Chess. I figure that Chess is a much more mature subject, with many more people who are considerably more experienced than me; whereas web analytics (and especially my main area of interest, testing) is a much younger field, and there's scope for sharing ideas and experiences which are still novel and interesting. However, there are some similarities between the two - here are a few thoughts.

Strategies and Tactics

Black prepares a tactic.

Plans in Chess can be broadly separated into strategies (long-term aims) and tactics (short term two-or-three move plans). The long-term aim is to checkmate your opponent's king, and throughout the game that will become the more prominent goal. In the short term, as you progress through the opening moves, you'll identify potential opportunities to capture your opponent's pieces, to put your pieces on good squares and to restrict your opponent's chances of beating you. Some of these are short term, some are long term. For example, it may be possible to win one of your opponent's bishops with a cunning trap (if your opponent is not vigilant). This will make it easier to achieve your long-term goal of winning the game by checkmating your opponent.

Testing offers the same range of opportunities: are you looking for quick wins (who isn't?) or are you planning to redesign an entire page, or even an entire site? Will you be implementing wins as soon as you have confirmed results, or are you going to iterate and try to do even better? Are you compiling wins to launch in one big bang, or are you testing, implementing and then repeating? How far ahead are you planning? Neither approach is necessarily better, as long as you're planning, and everybody is agreed on the plan!

Aims and Goals

White threatens checkmate.

As I've just mentioned, in Chess there's a clear goal: checkmate your opponent's king, by threatening to capture his king and leaving him with no way to escape. The aim for your online testing program may not be so clear cut, and if it isn't, then it might be time to find one single aim for it. You might call it the mission statement for your optimisation program, but whatever you call it, it's important that everybody who's involved in your program understands the purpose of the testing team. The same applies to each test, as I've mentioned before - each test should have a clear aim, that everybody understands and that can be measured in some clearly-defined way.

Values and KPIs

Each piece in Chess has a certain nominal value; the values aren't precise, but they provide a meaningful comparison of the strength and ability of each piece. For example, the rook is worth five pawns, the knight and bishop are worth three pawns each, and the queen is worth nine. This enables players to quickly evaluate a position in a game and say which player is winning, or if the position is roughly equal. It's slightly more complicated than that, as you have to take into consideration the position of the pieces and so on, but a quick comparison of the total material value that each player has will give a good idea of who's winning.

Rooks are worth five pawns; bishops are worth three pawns and queens are worth nine.

This also means that it's possible to determine if a plan or a strategy is likely to win and is worth pursuing. It may be possible to trap your opponent's rook (worth five pawns), but if doing so will mean losing two knights (each worth three pawns) and a pawn, then the trap is not really beneficial to you. If, on the other hand, you could trap your opponent's king (winning the game) at the cost of two rooks and a knight, then that's definitely worth doing.

The key performance indicator for a game of Chess is your opponent's king, and if you can measure how close you are to capturing (or checkmating) your opponent's king, then you can see how close you are to winning the game. You also need to keep your own king safe, but that's where the analogy breaks down :-)

White wins despite less material.

In online testing, your plan, strategy and tests all need to have KPIs. Once you've established your long-term aim, you can set KPIs against each test which are connected to achieving the long-term aim. If you want to improve the return on investment (ROI) of your online marketing, you could look at the landing page... improve the bounce rate and the exit rate, and encourage more people to move further into your site and view your products. Alternatively, you could look at improving conversion of visitors from cart (basket) to checkout. Or perhaps the flow of visitors through your checkout process. Providing you can tie each of your tests and your tactics to the overall strategy: "Improve ROI for online marketing" then you can measure whether or not it's succeeding.

Classifying your KPIs in order of importance is also important - as we saw in Chess, if you can win your opponent's pieces but lose several of yours in the process, then it's probably not a good idea. In testing, what would you do if your test recipe had a worse bounce rate but higher overall conversion (a situation that's not impossible)? Which is the more important metric - conversion, or bounce rate? Would your answer be the same if it was improved conversion but lower revenue (people not spending as much per order)? Are you going to capture your opponent's knight but lose your queen?

Win, lose or draw?

Queen takes King: checkmate!

In Chess, there are clear rules that determine the outcome of a game. Either one player wins (so the other loses), or it's a draw, and there are various ways of drawing: including by agreement (both players decide neither can win); by stalemate (one player cannot make any legal moves) or a drawn position where it's clear that neither player has enough material to checkmate the other.

I hate losing at Chess. However, the truth be told, I'm not much above average as a Chess player, and I'm the weakest player at my club (this has not deterred me, and I still play for fun). This means that I get plenty of chances to analyse my losses, and see where I could improve. Do I make the same mistakes in future games? Not usually, no.

However, I'm not satisfied to stay as the weakest player in my club - I just see this as an opportunity to do some giant-slaying in my future matches. I read books, I visit Chess websites, I practice against other people and against computers, but I especially review my own games. Occasionally, I win. And do I analyse my winning games? Absolutely - I may have already seen during the game that my opponent missed a chance to beat me, but did I also miss a chance to win more easily?

In online optimisation, the rules for calling a test a win, lose or draw are still up for debate, and they vary between companies. And so they should. For example, how long should you run your test? Each company will have its own testing program with its own tactics and strategies, and its own requirements. Do you want to have 99.9% confidence that a test will win, or do you want some directional test data to support something you already believed based on other data sources? How quickly do you want to run the next test, or implement the winner? Providing that the rules for calling the win, lose or draw are agreed in advance, I might even suggest that they could vary between tests. This is, of course, totally different from Chess, where the centuries-old rules of the game clearly state the requirements for a win or a draw. Otherwise though, I think it's fair to say that KPIs, metrics and strategy have their approximate equivalents in pieces, pawns and plans - and that thinking and planning are definitely the way forward!

Chess cartoons taken from the 1971 printing of Chess for Children, originally published 1960.

Monday, 1 June 2015

What is a "growth hacker"?

Okay, I admit it: I'm confused. I kept up to date with "experts", "gurus", "rock stars" and "ninjas", but I've reached the limit of my understanding. Why are we (as the online analytics community) now using the term 'hacker'? When modems were dial-up, and going online meant connecting your computer to your telephone headset, hackers were bad people who illegally broke into (or 'hacked') networks

Nowadays, though, hackers are everywhere, and one of the main culprits (especially online) are the "growth hackers". I'm just going to borrow from Wikipedia to set some context for what these new hackers actually are:

Hacker (term), is a term used in computing that can describe several types of persons
- Hacker (computer security) someone who seeks and exploits weaknesses in a computer system or computer network
- Hacker (hobbyist), who makes innovative customizations or combinations of retail electronic and computer equipment
- Hacker (programmer subculture), who combines excellence, playfulness, cleverness and exploration in performed activities

Hacker image credit: Pinsoft Studios
So we have, "excellence, playfulness, cleverness and exploration" in performed activities? Really? That's what a hacker is nowadays? Can't we just be good at what we do? We have to be playful and creative while we're at it? Perhaps I'm an optimisation hacker and I never realised.

My research into growth hacking indicates that the term was first coined in 2010 by Sean Ellis. Why he chose 'hacking', I'm not sure (especially given its previous connotations), but here's how he described growth hacking:

"A growth hacker is a person whose true north is growth. Everything they do is scrutinized by its potential impact on scalable growth. ... I’ve met great growth hackers with engineering backgrounds and others with sales backgrounds; the common characteristic seems to be an ability to take responsibility for growth and an entrepreneurial drive. The right growth hacker will have a burning desire to connect your target market with your must have solution ... The problem is that not all people are cut out to be growth hackers."

So: a growth hacker is a marketer whose key performance indicator is growth. So why 'hacker'? Perhaps it's about cracking the code for growth and finding a short cut to success? Perhaps it's about carving a way through a jungle filled with bad ideas for growth, with an instinctive true north and a sharp blade to cut through all the erroneous ideas?

Image credit: Recode.net

It's an imaginitive either way. And now, five years after Mr Ellis's post, it seems we have growth hackers everywhere (ironic, considering the URL for the original blog post is /where-are-all-the-growth-hackers/ - they now have multiple websites and Twitter accounts ;-)

So - my curiousity has been satisfied: a [good] growth hacker is a marketer who will help rapidly accelerate growth for a small or start-up company by rapidly analysing what's working for its audience and focusing on those strategies with agility and velocity. Why are they so popular now? Because - I'm guessing - after the global financial issues of 2008-2009, there is now much more interest and emphasis in start-ups and the entrepreneurial spirit - and every start-up needs a growth hacker to crack the code to accelerated growth rates.

Wednesday, 16 July 2014

When to Quit Iterative Testing: Snakes and Ladders

I have blogged a few times about iterative testing, the process of using one test result to design a better test and then repeating the cycle of reviewing test data and improving the next test. But there are instances when it's time to abandon iterative testing, and play analytical snakes and ladders instead. Surely not? Well, there are some situations where iterative testing is not the best tool (or not a suitable tool) to use in online optimisation, and it's time to look at other options.

Three situations where iterative testing is totally unsuitable:

1. You have optimised an area of the page so well that you're now seeing the law of diminshing returns - your online testing is showing smaller and smaller gains with each test and you're reaching the top of the ladder.
2. The business teams have identified another part of the page or site that is a higher priority than the area you're testing on.
3. The design teams want to test something game-changing, which is completely new and innovative.

This is no bad thing.

After all, iterative testing is not the be-all-and-end-all of online optimization. There are other avenues that you need to explore, and I've mentioned previously the difference between iterative testing and creative testing. I've also commented that fresh ideas from outside the testing program (typically from site managers who have sales targets to hit) are extremely valuable. All you need to work out is how to integrate these new ideas into your overall testing strategy. Perhaps your testing strategy is entirely focused on future-state (it's unlikely, but not impossible). Sometimes, it seems, iterative testing is less about science and hypotheses, and more like a game of snakes and ladders.

Three reasons I've identified for stopping iterative testing.

1. It's quite possible that you reach the optimal size, colour or design for a component of the page. You've followed your analysis step by step, as you would follow a trail of clues or footsteps, and it's led you to the top of a ladder (or a dead end) and you really can't imagine any way in which the page component could be any better. You've tested banners, and you know that a picture of a man performs better than a woman, that text should be green, the call to action button should be orange and that the best wording is "Find out more." But perhaps you've only tested having people in your banner - you've never tried having just your product, and it's time to abandon iterative testing and leap into the unknown. It's time to try a different ladder, even if it means sliding down a few snakes first.

2. The business want to change focus. They have sales performance data, or sales targets, which focus on a particular part of the catalogue: men's running shoes; ladies' evening shoes, or high-performance digital cameras. Business requests can change far more quickly than test strategies, and you may find yourself playing catch-up if there's a new priority for the business. Don't forget that it's the sales team who have to maintain the site, meet the targets and maximise their performance on a daily basis, and they will be looking for you to support their team as much as plan for future state. Where possible, transfer the lessons and general principles you've learned from previous tests to give yourself a head start in this new direction - it would be tragic if you have to slide down the snake and start right at the bottom of a new ladder.

3. On occasions, the UX and design teams will want to try something futuristic, that exploits the capabilities of new technology (such as Scene 7 integration, AJAX, a new API, XHTML... whatever). If the executive in charge of online sales, design or marketing has identified or sponsored a brand new online technology that will probably revolutionise your site's performance, and he or she wants to test it, then it'll probably get fast-tracked through the testing process. However, it's still essential to carry out due diligence in the testing process, to make sure you have a proper hypothesis and not a HIPPOthesis. When you test the new functionality, you'll want to be able to demonstrate whether or not it's helped your website, and how and why. You'll need to have a good hypothesis and the right KPIs in place. Most importantly - if it doesn't do well, then everybody will want to know why, and they'll be looking to you for the answers. If you're tracking the wrong metrics, you won't be able to answer the difficult questions.

As an example, Nike have an online sports shoe customisation option - you can choose the colour and design for your sports shoes, using an online palette and so on. I'm guessing that it went through various forms of testing (possibly even A/B testing) and that it was approved before launch. But which metrics would they have monitored? Number of visitors who tried it? Number of shoes configured? Or possibly the most important one - how many shoes were purchased? Is it reasonable to assume that because it's worked for Nike, that it will work for you, when you're looking to encourage users to select car trim colours, wheel style, interior material and so on? Or are you creating something that's adding to a user's workload and making it less likely that they will actually complete the purchase?

So, be aware: there are times when you're climbing the ladder of iterative testing that it may be more profitable to stop climbing, and try something completely different - even if it means landing on a snake!

Tuesday, 24 June 2014

Why Does Average Order Value Change in Checkout Tests?

The first discussion huddle I led at the Digital Analytics Hub in 2014 looked at why average order value changes in checkout tests, and was an interesting discussion. With such a specific title, it was not surprising that we wandered around the wider topics of checkout testing and online optimisation, and we covered a range of issues, tips, troubles and pitfalls of online testing.

But first: the original question - why does average order value (AOV) change during a checkout test? After all, users have completed their purchase selection, they've added all their desired items to the cart, and they're now going through the process of paying for their order. Assuming we aren't offering upsells at this late stage, and we aren't encouraging users to continue shopping, or offering discounts, then we are only looking at whether users complete their purchase or not. Surely any effect on order value should be just noise?

For example, if we change the wording for a call to action from 'Continue' to 'Proceed' or 'Go to payment details', then would we really expect average order value to go up or down? Perhaps not. But, in the light of checkout test results that show AOV differences, we need to revisit our assumptions.

After all, it's an oversimplification to say that all users are affected equally, irrespective of how much they're intending to spend. More analysis is needed to look at conversion by basket value (cart value) to see how our test recipe has affected different users based on their cart value. If conversion is affected equally across all price bands, then we won't see a change in AOV. However, how likely is that?

Other alternatives: perhaps there's no real pattern in conversion changes: low-price-band, mid-price-band, high-price-band and ultra-high-price-band users show a mix of increases and decreases. Any overall AOV change is just noise, and the statistical significance of the change is low.

But let's suppose that the higher price-band users don't like the test recipe, and for whatever reason, they decide to abandon. The AOV for the test recipe will go down - the spread of orders for the test recipe is skewed to the lower price bands. Why could this be? We discussed various test scenarios:

- maybe the test recipe missed a security logo? Maybe the security logo was moved to make way for a new design addition - a call to action, or a CTA for online chat - a small change but one that has had significant consequences.

- maybe the test recipe was too pushy, and users with high ticket items felt unnecessarily pressured or rushed? Maybe we made the checkout process feel like express checkout, and we inadvertantly moved users to the final page too quickly. For low-ticket items, this isn't a problem - users want to move through with minimum fuss and feel as if they're making rapid progress. Conversely, users who are spending a larger amount want to feel reassured by a steady checkout process which allows the user to take time on each page without feeling rushed?

- sometimes we deliberately look to influence average order value - to get users to spend more, add another item to their order (perhaps it's batteries, or a bag, or the matching ear-rings, or a warranty). No surprises there then, that average order value is influenced; sometimes it may go down, because users felt we were being too pushy.

Here's how those changes might look as conversion rates per price band, with four different scenarios:

Scenario 1: Conversion (vertical axis) is improved uniformly across all price bands (low - very high), so we see a conversion lift and average order value is unchanged.

Scenario 2: Conversion is decreased uniformly across all price bands; we see a conversion drop with no change in order value.

Scenario 3: Conversion is decreased for low and medium price bands, but improved for high and very-high price bands. Assuming equal order volumes in the baseline, this means that conversion is flat (the average is unchanged) but average order value goes up.

Scenario 4: Conversion is improved selectively for the lowest price band, but decreases for the higher price bands. Again, assuming there are similar order volumes (in the baseline) for each price band, this means that conversion is flat, but that average order value goes down.

There are various combinations that show conversion up/down with AOV up/down, but this is the mathematical and logical reason for the change.

Explaining why this has happened, on the other hand, is a whole different story! :-)

Friday, 30 May 2014

Digital Analytics Hub 2014 - Preview

Next week, I'll be returning to the Digital Analytics Hub in Berlin, to once again lead discussion groups on online testing. Last year, I led discussions on "Why does yesterday's winner become today's loser?" and "Risk and Reward: Iterative vs Creative Testing." I've been invited to return this year, and I'll be discussing "Why does Average Order Value change in checkout tests?" and "Is MVT really all that great?" - the second one based on my recent blog post asking if multi-variate testing is an online panacea. I'm looking forward to catching up with some of the friends I made last year, and to meeting some new friends in the world of online analytics.

An extra bonus for me is that the Berlin InterContinental Hotel, where the conference is held, has a Chess theme for their conference centre. The merging of Chess with online testing and analytics? Something not to be missed.

The colour scheme for the rooms is very black-and-white; the rooms have names like King, Rook and Knight; there are Chess sets in every room, and each room has at least one enlarged photo print of a Chess-themed portrait. You didn't think a Chess-themed portrait was possible? Here's a sample of the pictures from last year (it's unlikely that they've been changed). From left to right, top to bottom: white bishop, white king; white king; black queen; white knight; white knight with black rook (I think).

Tuesday, 7 January 2014

The Key Questions in Online Testing

As you begin the process of designing an online test, the first thing you'll need is a solid test hypothesis. My previous post outlined this, looking at a hypothesis, HIPPOthesis and hippiethesis. To start with a quick recap, I explained that a good hypothesis says something like, "IF we make this change to our website, THEN we expect to see this improvement in performance BECAUSE we will have made it easier for visitors to complete their task." Often, we have a good idea about what the test should be - make something bigger, have text in red instead of black... whatever.

Stating the hypothesis in a formal way will help to draw the ideas together and give the test a clear purpose. The exact details of the changes you're making in the test, the performance change you expect, and the reasons for the expected changes will be specific to each test, and that's where your web analytics data or usability studies will support your test idea. For example, if you're seeing a large drop in traffic between the cart page and the checkout pages, and your usability study shows people aren't finding the 'continue' button, then your hypothesis will reflect this.

In between the test hypothesis and the test execution are the key questions. These are the key questions that you will develop from your hypothesis, and which the test should answer. They should tie very closely to the hypothesis, and they will direct the analysis of your test data, otherwise you'll have test data that will lack a focus and you'll struggle to tell the story of the test. Think about what your test should show - what you'd like it to prove - and what you actually want to answer, in plain English.

Let's take my offline example from my previous post. Here's my hypothesis: "If I eat more chocolate, then I will be able to run faster because I will have more energy."

It's good - but only as a hypothesis (I'm not saying it's true, or accurate, but that's why we test!). But before I start eating chocolate and then running, I need to confirm the exact details of how much chocolate, what distance and what times I can achieve at the moment. If this was an ideal offline test, there would be two of me, one eating the chocolate, and one not. And if it was ideal, I'd be the one eating the chocolate :-)

So, the key questions will start to drive the specifics of the test and the analysis. In this case, the first key question is this: "If I eat an additional 200 grams of chocolate each day, what will happen to my time for running the 100 metres sprint?"

It may be 200 grams or 300 grams; the 100m or the 200m, but in this case I've specified the mass of chocolate and the distance. Demonstrating the 'will have more energy' will be a little harder to do. In order to do this, I might add further questions, to help understand exactly what's happening during the test - perhaps questions around blood sugar levels, body mass, fat content, and so on. Note at this stage that I haven't finalised the exact details - where I'll run the 100 metres, what form the chocolate will take (Snickers? Oreos? Mars?), and so on. I could specify this information at this stage if I needed to, or I could write up a specific test execution plan as the next section of my test document.

In the online world I almost certainly will be looking at additional metrics - online measurements are rarely as straightforward as offline. So let's take an online example and look at it in more detail.

"If I move the call-to-action button on the cart page to a position above the fold, then I will drive more people to start the checkout process because more people will see it and click on it."

And the key questions for my online test?

"How is the click-through rate for the CTA button affected by moving it above the fold?"
"How is overall cart-to-complete conversion affected by moving the button?"
"How are these two metrics affected if the button is near the top of the page or just above the fold?"

As you can see, the key questions specify exactly what's being changed - maybe not to the exact pixel, but they provide clear direction for the test execution. They also make it clear what should be measured - in this case, there are two conversion rates (one at page level, one at visit level). This is perhaps the key benefit of asking these core questions: they drive you to the key metrics for the test.

"Yes, but we want to measure revenue and sales for our test."

Why? Is your test meant to improve revenue and sales? Or are you looking to reduce bounce rate on a landing page, or improve the consumption of learn content (whitepapers, articles, user reviews etc) on your site? Of course, your site's reason-for-being is to general sales and revenue. Your test data may show a knock-on improvement on revenue and sales, and yes, you'll want to make sure that these vital site-wide metrics don't fall off a cliff while you're testing, but if your hypothesis says, "This change should improve home page bounce rate because..." then I propose that it makes sense to measure bounce rate as the primary metric for the test success. I also suspect that you can quickly tie bounce rate to a financial metric through some web analytics - after all, I doubt that anyone would think of trying to improve bounce rate without some view of how much a successful visitor generates.

So: having written a valid hypothesis which is backed by analysis, usability or other data (and not just a go-test-this mentality from the boss), you are now ready to address the critical questions for the test. These will typically be, "How much....?" and "How does XYZ change when...?" questions that will focus the analysis of the test results, and will also lead you very quickly to the key metrics for the test (which may or may not be money-related).

I am not proposing to pack away an extra 100 grams of chocolate per day and start running the 100 metres. It's rained here every day since Christmas and I'm really not that dedicated to running. I might, instead, start on an extra 100 grams of chocolate and measure my body mass, blood cholesterol and fat content. All in the name of science, you understand. :-)

Wednesday, 24 July 2013

The Science of A Good Hypothesis

Good testing requires many things: good design, good execution, good planning. Most important is a good idea - or a good hypothesis, but many people jump into testing without a good reason for testing. After all, testing is cool, it's capable of fixing all my online woes, and it'll produce huge improvements to my online sales, won't it?

I've talked before about good testing, and, "Let's test this and see if it works," is an example of poor test planning. A good idea, backed up with evidence (data, or usability testing, or other valid evidence) is more likely to lead to a good result. This is the basis of a hypothesis, and a good hypothesis is the basis of a good test.

What makes a good hypothesis? What, and why.

According to Wiki Answers, a hypothesis is, "An educated guess about the cause of some observed (seen, noticed, or otherwise perceived) phenomena, and what seems most likely to happen and why. It is a more scientific method of guessing or assuming what is going to happen."

In simple, testing terms, a hypothesis states what you are going to test (or change) on a page, what the effect of the change will be, and why the effect will occur. To put it another way, a hypothesis is an "If ... then... because..." statement. "If I eat lots of chocolate, then I will run more slowly because I will put on weight." Or, alternatively, "If I eat lots of chocolate, then I will run faster because I will have more energy." (I wish).

However, not all online tests are born equal, and you could probably place the majority of them into one of three groups, based on the strength of the original theory. These are tests with a hypothesis, tests with a HIPPOthesis and tests with a hippiethesis.

Tests with a hypothesis

These are arguably the hardest tests to set up. A good hypothesis will rely on the test analyst sitting down with data, evidence and experience (or two out of three) and working out what the data is saying. For example, the 'what' could be that you're seeing a 93% drop-off between the cart and the first checkout page. Why? Well, the data shows that people are going back to the home page, or the product description page. Why? Well, because the call-to-action button to start checkout is probably not clear enough. Or we aren't confirming the total cost to the customer. Or the button is below the fold.

So, you need to change the page - and let's take the button issue as an example for our hypothesis. People are not progressing from cart to checkout very well (only 7% proceed). [We believe that] if we make the call to action button from cart to checkout bigger and move it above the fold, then more people will click it because it will be more visible.

There are many benefits of having a good hypothesis, and the first one is that it will tell you what to measure as the outcome of the test. Here, it is clear that we will be measuring how many people move from cart to checkout. The hypothesis says so. "More people will click it" - the CTA button - so you know you're going to measure clicks and traffic moving from cart to checkout. A good hypothesis will state after the word 'then' what the measurable outcome should be.

In my chocolate example above, it's clear that eating choclate will make me either run faster or slower, so I'll be measuring my running speed. Neither hypothesis (the cart or the chocolate) has specified how big the change is. If I knew how big the change was going to be, I wouldn't test. Also, I haven't said how much more chocolate I'm going to eat, or how much faster I'll run, or how much bigger the CTA buttons should be, or how much more traffic I'll convert. That's the next step - the test execution. For now, the hypothesis is general enough to allow for the details to be decided later, but it frames the idea clearly and supports it with a reason why. Of course, the hypothesis may give some indication of the detailed measurements - I might be looking at increasing my consumption of chocolate by 100 g (about 4 oz) per day, and measuring my running speed over 100 metres (about 100 yds) every week.

Tests with a HIPPOthesis

The HIPPO, for reference, is the HIghest Paid Person's Opinion (or sometimes just the HIghest Paid PersOn). The boss. The management. Those who hold the budget control, who decide what's actionable, and who say what gets done. And sometimes, what they say is that, "You will test this". There's virtually no rationale, no data, no evidence or anything. Just a hunch (or even a whim) from the boss, who has a new idea that he likes. Perhaps he saw it on Amazon, or read about it in a blog, or his golf partner mentioned it on the course over the weekend. Whatever - here's the idea, and it's your job to go and test it.

These tests are likely to be completely variable in their design. They could be good ideas, bad ideas, mixed-up ideas or even amazing ideas. If you're going to run the test, however, you'll have to work out (or define for yourself) what the underlying hypothesis is. You'll also need to ask the HIPPO - very carefully - what the success metrics are. Be prepared to pitch this question somewhere between, "So, what are you trying to test?" and "Are you sure this is a productive use of the highly skilled people that you have working for you?" Any which way, you'll need the HIPPO to determine the success criteria, or agree to yours - in advance. If you don't, you'll end up with a disastrous recipe being declared a technical winner because it (1) increased time on page, (2) increased time on site or (3) drove more traffic to the Contact Us page, none of which were the intended success criteria for the test, or were agreed up-front, and which may not be good things anyway.

If you have to have to run a test with a HIPPOthesis, then write your own hypothesis and identify the metrics you're going to examine. You may also want to try and add one of your own recipes which you think will solve the apparent problem. But at the very least, nail down the metrics...

Tests with a hippiethesis
Hippie: noun
a person, especially of the late 1960s, who rejected established institutions and values and sought spontaneity, etc., etc. Also hippy

The final type of test idea is a hippiethesis - laid back, not too concerned with details, spontaneous and putting forward an idea it because it looks good on paper. "Let's test this because it's probably a good idea that will help improve site performance." Not as bad as the 'Test this!" that drives a HIPPOthesis, but not fully-formed as a hypothesis, the hippiethesis is probably (and I'm guessing) the most common type of test.

Some examples of hippietheses:

"If we make the product images better, then we'll improve conversion."
"The data shows we need to fix our conversion funnel - let's make the buttons blue instead of yellow."
"Let's copy Amazon because everybody knows they're the best online."

There's the basis of a good idea somewhere in there, but it's not quite finished. A hippiethesis will tell you that the lack of a good idea is not a problem, buddy, let's just test it - testing is cool (groovy?), man! The results will be awesome.

There's a laid-back approach to the test (either deliberate or accidental), where the idea has not been thought through - either because "You don't need all that science stuff", or because the evidence to support a test is very flimsy or even non-existent. Perhaps the test analyst didn't look for the evidence; perhaps he couldn't find any. Maybe the evidence is mostly there somewhere because everybody knows about it, but isn't actually documented. The danger here is that when you (or somebody else) start to analyse the results, you won't recall what you were testing for, what the main idea was or which metrics to look at. You'll end up analysing without purpose, trying to prove that the test was a good idea (and you'll have to do that before you can work out what it was that you were actually trying to prove in the first place).The main difference between a hypothesis and hippiethesis is the WHY. Online testing is a science, and scientists are curious people who ask why. Web analyst Avinash Kaushik calls it the three levels of so what test. If you can't get to something meaningful and useful, or in this case, testable and measureable, within three iterations of "Why?" then you're on the wrong track. Hippies don't bother with 'why' - that's too organised, formal and part of the system; instead, they'll test because they can, and because - as I said, testing is groovy.

A good hypothesis: IF, THEN, BECAUSE.

To wrap up: a good hypothesis needs three things: If (I make this change to the site) Then (I will expect this metric to improve) because (of a change in visitor behaviour that is linked to the change I made, based on evidence).

When there's no if: you aren't making a change to the site, you're just expecting things to happen by themselves. Crazy! If you reconsider my chocolate hypothesis, without the if, you're left with, "I will run faster and I will have more energy". Alternatively, "More people will click and we'll sell more." Not a very common attitude in testing, and more likely to be found in over-optimistic entrepreneurs :-)

When there's no then: If I eat more chocolate, I will have more energy. So what? And how will I measure this increased energy? There are no metrics here. Am I going to measure my heart rate, blood pressure, blood sugar level or body temperature?? In an online environment: will this improve conversion, revenue, bounce rate, exit rate, time on page, time on site or average number of pages per visit? I could measure any one of these and 'prove' the hypothesis. At its worst, a hypothesis without a 'then' would read as badly as, "If we make the CTA bigger, [then we will move more people to cart], [because] more people will click." which becomes "If we make the CTA bigger, more people will click." That's not a hypothesis, that's starting to state the absurdly obvious.

When there's no because: If I eat more chocolate, then I will run faster. Why? Why will I run faster? Will I run slower? How can I run even faster? There are metrics here (speed) but there's no reason why. The science is missing, and there's no way I can actually learn anything from this and improve. I will execute a one-off experiment and get a result, but I will be none the wiser about how it happened. Was it the sugar in the chocolate? Or the caffeine?

And finally, I should reiterate that an idea for a test doesn't have to be detailed, but it must be backed up by data (some, even if it's not great). The more evidence the better: think of a sliding scale from no evidence (could be a terrible idea), through to some evidence (a usability review, or a survey response, prior test result or some click-path analysis), through to multiple sources of evidence all pointing the same way - not just one or two data points, but a comprehensive case for change. You might even have enough evidence to make a go-do recommendation (and remember, it's a successful outcome if your evidence is strong enough to prompt the business to make a change without testing).

Web Optimisation, Maths and Puzzles

Header tag