Web Optimisation, Maths and Puzzles: conference

Showing posts with label conference. Show all posts

Thursday, 6 November 2014

Building Momentum in Online Testing - Key Takeaways

As I mentioned in my previous post, I was recently invited to speak at the eMetrics Summit in London, and based on discussions afterwards, the content was really useful to the attendees. I'm glad that people were able to find it useful, and here, I'd like to share some of the key points that I raised (and some that I forgot to mention).

Image Credit: eMetrics Summit official photography

There are a large number of obstacles to building momentum with an optimisation program, but most of them can be grouped into one of these categories:

A. Lack of development resource (HTML and JavaScript developers)
B. Lack of management buy-in and access to resource
C. Tests take too long to develop, run, or call a winner
D. Tests keep losing (or, perversely, tests keep winning and the view is that "testing is completed")
E. Lack of design resource (UXers or designers)

These issues can be addressed in a number of ways, and the general ideas I outlined were:

1. If you need to improve your win rate, or if you don't have much development resource, re-use your existing mboxes and iterate. You won't need to wait for IT deployments or for a developer to code new 'mboxes', you can use them again, test and learn and test again.

2. If you need to improve the impact of your tests (i.e. your tests are producing flat results, or the wins are very small) then make more dramatic changes to your test recipes, and create. I commented that generally speaking, the more differences there are between control and the test recipe, the greater the difference in performance (which may be positive or negative). If you keep iterating and making small changes, you'll probably see smaller lifts or falls; if you take a leap into the unknown, you'll either fly or crash.

Remember not to throw out your analytics just because you're being creative - you'll need to look at the analytics carefully, as always, and any and all VOC data you have. The key difference is that you're testing bigger changes, more changes, or both - you shouldn't be trying new ideas just because they seem good (you'll still need some reason for the recipe).

3. If you need to get tests moving more quickly, then reduce the number of recipes per test. More recipes means more time to develop; more time to run (less traffic per recipe per day) and more time to analyse the results afterwards. Be selective - each recipe should address the original test hypothesis in a different way, you shouldn't need to add on recipe after recipe just because it looks like a good idea. Also, only test on high-traffic or critical pages, where there's plenty of volume of traffic, or where it's mission-critical (for example, cart pages, or key landing pages). As a bonus, if you work on optimising conversion or bounce rate for your PPC or display marketing traffic, you'll have an automatic champion in your online marketing department.

Extra: If you do decide to run with a large number of recipes, then monitor the recipes' performance more frequently. As soon as you can identify a recipe which is significantly and definitely underperforming vs control, switch it off. This has two benefits: a) you drive a larger share of traffic through the remaining recipes, and b) you're saving the business money because you've stopped traffic going through a low-converting (or low-performing) recipe - which was costing money.

4. Getting management buy-in and support on an ongoing basis: this is not easy, especially when analysts are, stereotypically, numbers-people rather than people-people. We find it easier to work with numbers than to work with people, since numbers are clear-cut and well-defined, and people can be... well... messy and unpredictable. Brooks Bell have recently released a blog post about five ways to manage up, which I recommend. The main recommendation is to get out there and share. Share your winners (pleasant) and your losers (unpleasant), but also explain why you think a test is winning or losing. This kind of discussion will lead naturally on to, "Well, it lost because this component was too big/too small/in the wrong place." and starts to inform your next test.

I also talked through my ideas on what makes a good test idea, and what makes for a bad test idea; here's the diagram I shared on 'good test ideas'.

In this diagram, the top circle defines what your customers want, based on your analysis; the lower left circle defines your coding capabilities and the lower right defines ideas that are aligned with your company brand and which are supported by your management team.

So where are the good test ideas? You might think that they are in segment D. In fact, these are recommendations for immediate action. The best test ideas are close to segment D, but not actually in it; the areas around segment D are the best places - where two of the three circles intersect, but where the third is nearly aligned too. For example; in segment F, we have ideas that the developers can produce, and which management are aligned with, but where there is a doubt about if it will help customer experience. Here, the idea may be a new way of customising or personalising your product in your order process - upgrading the warranty or guarantee; adding a larger battery or a special waterproof coating (whatever your product may be). This may work well on your site, but it may also be too complex. Your customer experience data may show that users want more options for customising and configuring their purchase - but is this the best way to do it? Let's test!

I also briefly covered bad test ideas - things that should not be tested. There's a short list:

Don't test making improvements such as bug fixes, broken links, broken image links, spelling and grammar mistakes. There's no point - it's a clear winner.

Don't test fixes for historic bugs in your page templates - for example where you're integrating newer designs or elements (product videos, for example) that weren't catered for when the layout was originally built. The alignment of the elements on the page are a little off, things don't fit or line up vertically, horizontally - these can be improved with a test, but really, this isn't fixing the main issue, which is that the page needs fixing. The test will show the financial upside of making the fix (and this would be the only valid case for running the test) but the bottom line is that a test will only prove what you already know.

I wrapped up my keynote by mentioning the need to select your KPIs for the test, and for that, I have to confess that I borrowed from a blog post I wrote earlier this year, which was a sporting example of metrics.

Presenting the "metrics in sport" slide, Image Credit: Aurelie Pols

I'm already looking forward to the next conference, which will probably be in 2015!

Tuesday, 4 November 2014

Building momentum in your online optimisation program (eMetrics UK)

At the end of October, I spoke at eMetrics London. I was invited by Peter O'Neill to present at the conference, and I anticipated that I would be speaking as part of a track on optimisation or testing. However, Peter put me on the agenda with the keynote at the start of the second day, a slot I feel very honoured to have been given.

Jim Sterne, my Web Analytics hero, presenting

Selfie: a quick last-minute practice

Peter O'Neill, eMetrics UK organiser

I thoroughly enjoyed presenting - and I'm still learning on making formal web analytics presentations (and probably will always be) - but for me the highlight of the Summit was meeting and talking with Jim Sterne, the Founding President and current Chairman of the Digital Analytics Association, and the Founder of the eMetrics Marketing Optimization Summit. I've been following him since before Twitter and Facebook, through his email newsletter "Sterne Measures" - and, as he kindly pointed out to me when I mentioned this, "Oh, you're old!" Jim gave a great keynote presentation on going from "Bits and Bytes to Insights" which has to be one of the clearest and most comprehensive presentations on the history and future of web analytics that I've ever heard.

My topic for the keynote was "Building momentum in your online optimisation program." From my discussions at various other conferences, I've noted that people aren't concerned with getting an online testing program started, and overcoming the initial obstacles; many analysts are now struggling to keep it running. I've previously blogged on getting a testing program off the ground, and this topic is more about keeping it up in the air.

While putting the final parts of the presentation together I determined not to re-use the material from my blog - as much as possible. The emphasis in my presentation was on how to take the first few tests and move towards a critical mass as quickly as possible - where test ideas and test velocity will increase sufficiently that there will be continuous ongoing interest in your tests - winners and losers, so that you'll be able to make a significant, consistent improvement to your company's website.

I'm just getting resettled back into the routine of normal work, but I'll share the key points (including some parts I missed) from my presentation in a future blog post as soon as I can.

Friday, 11 July 2014

Is Multi-Variate Testing Really That Good?

The second discussion that I led at the Digital Analytics Hub in Berlin in June was entitled, "Is Multi Variate Testing Really That Good?" Although only a few delegates attended, it got some good participation from a range of people representing a range of analytical and digital professionals, and in this post I'll cover some of the key points.

- The number of companies using MVT is starting to increase, although it's a very slow increase and it still has only low adoption rates. It's not as widespread as perhaps the tool vendors would suggest.

- The main barriers (real or perceived) to MVT are complexity (in design and analysis) and traffic volumes (multiple recipes require large volumes of traffic in order to get meaningful results in a useful timeframe).

There is an inherent level of complexity in MVT, as I've mentioned before (and one day soon I will explain how to analyse the results) and the tool vendors seem to imply that the test design must also be complicated. It doesn't. I've mentioned in a previous post on MVT that sometimes the visual design of a multi-variate test does not have to be complicated, it can just involve a large number of small changes run simultaneously.

The general view during the discussion was that MVT would have to involve a complicated design with a large number of variations per element (e.g. testing a call-to-action button in red, green, yellow, orange and blue, with five different wordings). In my opinion, this would be complicated as an A/B/n test, so as an MVT it would be extremely complex, and, to be honest, totally unsuitable for an entry-level test.

We spent a lot of our discussion time discussing various pages and scenarios where MVT is totally unsuitable, such as site navigation. A number of online sites have issues with large catalogues and navigation hierarchies, and it's difficult to decide how best to display the whole range of products - MVT isn't the tool to use here, we discussed card-sorting, brainstorming and visualisations instead of A/B testing. This was one of the key lessons for me - MVT is a powerful tool, but sometimes, you don't need a powerful tool, you just need the basic one. A power drill is never going to be good at cutting wood - a basic handsaw is the way to go. It's all about selecting the right tool for the job.

Looking at MVT, as with all online optimisation programs, the best plan is to build up to a full MVT in stages, with initial MVT trials run as pilot experiments. Start with something where the basic concept for testing is easy to grasp, even if the hypothesis isn't great. The problem statement or hypothesis could be, "We believe MVT is a valuable tool and in order to use it, we're going to start with a simple pilot as a proof of concept." And why not? :-)

Banners are a great place to start - after all, the marketing team spend a lot of money on it, and there's nothing quite as eye-catching as a screenshot of a banner in your test report documents and presentations. They're also very easy to explain... let's try an example. Three variables that can be considered are gender of the model (man or woman), wording of the banner text ("Buy now" vs "On Sale") and the colour of the text (black or red).

There are eight possible combinations in total; here are a few potential recipes:

Recipe A	Recipe B
Recipe C	Recipe D

Note that I've tried to keep the pictures similar - model is facing camera, smiling, with a blurred background. This may be a multi-variate test, but I'm not planning to change everything, and I'm keeping track of what I'm changing and what's staying the same!!

Designing a test like this has considerable benefits:
- it's easy to see what's being tested (no need to play 'spot the difference')
- you can re-use the same images for different recipes
- copywriters and merchandisers only need to come up with two lots of copy (which will be less than in an A/B/C/D test with multiple recipes).
- it's not going to take large numbers of recipes, and therefore is NOT going to require a large volume of traffic.

Some time soon, I'll explain how to analyse and understand the results from a multi-variate test, hopefully debunking the myths around how complicated it is.

Here's my series on Multi Variate Testing

Preview of Multi Variate testing
Web Analytics: Multi Variate testing
Explaining complex interactions between variables in multi-variate testing
Is Multi Variate Testing an Online Panacea - or is it just very good?
Is Multi Variate Testing Really That Good - (that's this article)
Hands on: How to set up a multi-variate test
And then: Three Factor Multi Variate Testing - three areas of content, three options for each!

Image credits:
man - http://www.findresumetemplates.com/job-interview
woman - http://www.sheknows.com/living

Tuesday, 24 June 2014

Why Does Average Order Value Change in Checkout Tests?

The first discussion huddle I led at the Digital Analytics Hub in 2014 looked at why average order value changes in checkout tests, and was an interesting discussion. With such a specific title, it was not surprising that we wandered around the wider topics of checkout testing and online optimisation, and we covered a range of issues, tips, troubles and pitfalls of online testing.

But first: the original question - why does average order value (AOV) change during a checkout test? After all, users have completed their purchase selection, they've added all their desired items to the cart, and they're now going through the process of paying for their order. Assuming we aren't offering upsells at this late stage, and we aren't encouraging users to continue shopping, or offering discounts, then we are only looking at whether users complete their purchase or not. Surely any effect on order value should be just noise?

For example, if we change the wording for a call to action from 'Continue' to 'Proceed' or 'Go to payment details', then would we really expect average order value to go up or down? Perhaps not. But, in the light of checkout test results that show AOV differences, we need to revisit our assumptions.

After all, it's an oversimplification to say that all users are affected equally, irrespective of how much they're intending to spend. More analysis is needed to look at conversion by basket value (cart value) to see how our test recipe has affected different users based on their cart value. If conversion is affected equally across all price bands, then we won't see a change in AOV. However, how likely is that?

Other alternatives: perhaps there's no real pattern in conversion changes: low-price-band, mid-price-band, high-price-band and ultra-high-price-band users show a mix of increases and decreases. Any overall AOV change is just noise, and the statistical significance of the change is low.

But let's suppose that the higher price-band users don't like the test recipe, and for whatever reason, they decide to abandon. The AOV for the test recipe will go down - the spread of orders for the test recipe is skewed to the lower price bands. Why could this be? We discussed various test scenarios:

- maybe the test recipe missed a security logo? Maybe the security logo was moved to make way for a new design addition - a call to action, or a CTA for online chat - a small change but one that has had significant consequences.

- maybe the test recipe was too pushy, and users with high ticket items felt unnecessarily pressured or rushed? Maybe we made the checkout process feel like express checkout, and we inadvertantly moved users to the final page too quickly. For low-ticket items, this isn't a problem - users want to move through with minimum fuss and feel as if they're making rapid progress. Conversely, users who are spending a larger amount want to feel reassured by a steady checkout process which allows the user to take time on each page without feeling rushed?

- sometimes we deliberately look to influence average order value - to get users to spend more, add another item to their order (perhaps it's batteries, or a bag, or the matching ear-rings, or a warranty). No surprises there then, that average order value is influenced; sometimes it may go down, because users felt we were being too pushy.

Here's how those changes might look as conversion rates per price band, with four different scenarios:

Scenario 1: Conversion (vertical axis) is improved uniformly across all price bands (low - very high), so we see a conversion lift and average order value is unchanged.

Scenario 2: Conversion is decreased uniformly across all price bands; we see a conversion drop with no change in order value.

Scenario 3: Conversion is decreased for low and medium price bands, but improved for high and very-high price bands. Assuming equal order volumes in the baseline, this means that conversion is flat (the average is unchanged) but average order value goes up.

Scenario 4: Conversion is improved selectively for the lowest price band, but decreases for the higher price bands. Again, assuming there are similar order volumes (in the baseline) for each price band, this means that conversion is flat, but that average order value goes down.

There are various combinations that show conversion up/down with AOV up/down, but this is the mathematical and logical reason for the change.

Explaining why this has happened, on the other hand, is a whole different story! :-)

Wednesday, 19 June 2013

Why is yesterday's test winner today's loser?

This post comes out of the xChange Berlin huddle which I led on 11 June 2013. xChange is very different from most web analytics conferences - most conferences have speakers and presentations, but xChange is focused around web analytics professionals meeting and discussing in small workshop groups. As the xChange website describes it:
"Expressly designed for enterprise analytics managers and digital marketing and measurement practitioners, X Change brings together top professionals in the field in a no-sales, all business, peer-to-peer environment for deep-dives into cutting edge online measurement topics."

At xChange Berlin 2013, I led two huddle groups - this was the first, entitled, "Why is yesterday's test winner today's loser?". I haven't attributed the content here to any particular participant - this is just a summary of our discussions. I should say now that the discussion was not even close to what I'd anticipated, but was even more interesting as a result!

The discussion kicked off with a review of a test win. Let's suppose that you have run your A/B test, and you have a winner. You ran it for long enough to achieve statistical significance and even achieved consistent trend lines. But somehow, when you implemented it, your financial metrics didn't show the same level of improvement as your test results. And now, the boss has come to your desk to ask if your test was really valid. "What happened? Why is yesterday's test winner today's loser?"

There are a number of reasons for this - let's take a look.

External factors
Yes, A/B tests split your traffic evenly between the test recipes, so that most external factors are accounted for. But what happens if your test was running while you had a large-scale TV campaign, or display or PPC campaign? Yes, that traffic would have been split between your test recipes, so the effect is - apparently - mitigated. But what if the advertising campaign resonated with your test recipe, which went on to win. During the non-campaign period, the control recipe would be better, or perhaps the results would have been more similar. Consequently, the uplift that you saw during the test would not be achieved in normal conditions.

Customer Experience Changes
When we start a test, there is quite often a dip in performance for the test recipe. It's new. It's unfamiliar and users have to become accustomed to it. It often takes a week or so for visitors to get used to it, and for accurate, meaningful and useful test results to develop. In particular, frequent repeat visitors will take some time to adjust to the changes (how often repeat visitors return will depend on your site). The same issue applies when you implement a winner - now, the whole population is seeing a new design, and it will take some time for them to adjust.

Visitor Segments
Perhaps the test recipe worked especially well with a particular visitor segment? Maybe new visitors, or search visitors, or visitors from social media, and that was responsible for the uplift. You have assumed (one way or another) that your population profile is fairly constant. But if you identify that your test recipe won because one or two segments really engaged with it, then you may not see the uplift if your population profile changes. What should you do instead? Set up a targeting implementation: target specific visitors, based on your test results, who engaged more (or converted better) with the test recipe. Show everybody else the same version of your site as usual, but for visitors who fit into a specific segment - show them the test recipe. I'll discuss targeting again at a later date, but here's a post I wrote a few months ago about online personalisation.

Time lapse between test win and implementation
This varied around the members of the group - where a company has a test plan, and there's a need to get a test up and running, it may not be possible to implement straight away. It also depends on what's being tested - can the test recipe be implemented immediately through the site team or CMS, or will it require IT roadmap work? Most of the group would use either the testing software (for example, Test and Target, or Visual Website Optimiser) and immediately set a winning recipe to 100% traffic (or 95%) until the change could be made permanently. Setting a winning recipe to 95% instead of 100% in effect enables the test to run for longer - you can continue to show that the test recipe is winning. It also means that visitors who were in the control group during the test (i.e. saw "Recipe A") will continue to see that recipe until the implementation is complete - better customer experience for that group? Something to think about!

My next post will be about the second huddle that I led, which was based on iterating vs creating. The title came from my recent blog post on iterative testing, but the discussion went in a very different direction, and again, was better for it!

Header tag