Web Optimisation, Maths and Puzzles: Testing versus Implementing

"Why can't we just make a change and see what happens? Why do we have to build an A/B test - it takes too long! We have a roadmap, a pipeline and a backlog, and we haven't got time."

It's not always easy to articulate why testing is important - especially if your company is making small, iterative, data-backed changes to the site and your tests consistently win (or, worse still, go flat). The IT team is testing carefully and cautiously, but the time taken to build the test and run it is slowing down everybody's pipelines. You work with the IT team to build the test (which takes time), it runs (which takes even more time), you analyze the test (why?) and you show that their good idea was indeed a good idea. Who knew?

Ask an AI what a global IT roadmap looks like...

However, if your IT team is building and deploying something to your website - a new way of identifying a user's delivery address; or a new way of helping users decide which sparkplugs or ink cartridges or running shoes they need - something new, innovative and very different, then I would strongly recommend that you test it with them, even if there is strong evidence for its effectiveness. Yes, they have carried out user-testing and it's done well. Yes, their panel loved it. Even the Head of Global Synergies liked it, and she's a tough one to impress. Their top designers have spent months in collaboration with the project manager, and their developers have gone through the agile process so many times that they're as flexible as ballet dancers. They've barely reached the deadline for pre-Christmas implementation, and now is the time to implement it. It is ready. However, the Global Integration Leader has said that they must test before they launch, but that's okay as they have allocated just enough time for a pre-launch A/B test, then they'll go live as soon as the test is complete.

Sarah Harries, Head of Global Synergies

Everything hinges on the test launching on time, which it does. Everybody in the IT team is very excited to see how users engage with the new sparkplug selection tool and - more importantly for everybody else - how much it adds to overall revenue. (For more on this, remember that clicks aren't really KPIs).

But the test results come back: you have to report that the test recipe is underperforming at a rate of 6.3% conversion drop. Engagement looks healthy at 11.7%, but those users are dragging down overall performance. The page exit rate is lower, but fewer users are going through checkout and completing a purchase. Even after two full weeks, the data is looking negative.

Can you really recommend implementing the new feature? No; but that's not the end of the story. It's your job to now unpick the data, and turn analysis into insights: why didn't it win?!

The IT team, understandably, want to implement. After all, they've spent months building this new selector and the pre-launch data was all positive. The Head of Global Synergies is asking them why it isn't on the site yet. Their timeline allowed three weeks for testing and you've spent three weeks testing. Their unspoken assumption was that testing was a validation of the new design, not a step that might turn out to be a roadblock, and they had not anticipated any need for post-test changes. It was challenging enough to fit in the test, and besides, the request was to test it.

It's time to interrogate the data.

Moreover, they have identified some positive data points:

* Engagement is an impressive 11.7%. Therefore, users love it.

* The page exit rate is lower, so more people are moving forwards. That's all that matters for this page: get users to move forwards towards checkout.

* The drop in conversion is coming from the pages in the checkout process. That can't be related to the test, which is in the selector pages. It must be a checkout problem.

They question the accuracy of the test data, which contradicts all their other data.

* The sample size is too small.

* The test ran for too long/did not run for long enough

* The test was switched off before it had a chance to recover its 6.3% drop in conversion

They suggest that the whole A/B testing methodology is inaccurate.

* A/B testing is outdated and unreliable.

* The split between the two groups wasn't 50-50. There are 2.2% more visitors in A than B.

Maybe they'll comment that the data wasn't analyzed or segmented correctly, and they make some points about this:

* The test data includes users buying other items with their sparkplugs. These should be filtered out.

* The test data must have included users who didn't see the test experience.

* The data shows that users who browsed on mobile phones only performed at -5.8% on conversion, so they're doing better than desktop users.

Remember: none of this is personal. You are, despite your best efforts, criticising a project that they've spent weeks or even months polishing and producing. Nobody until this point has criticised their work, and in fact everybody has said how good it is. It's not your fault, your job is to present the data and to provide insights based on it. As a testing professional, your job is to run and analyse tests, not to be swayed into showing the data in a particular way.

They ran the test at the request of the Global Integration Leader, and burnt three weeks waiting for the test to complete. The deadline for implementing the new sparkplug selector is Tuesday, and they can't stop the whole IT roadmap (which is dependent on this first deployment) just because one test showed some negative data. They would have preferred not to test it at all, but it remains your responsibility to share the test data with other stakeholders in the business, marketing and merchandizing teams, who have a vested interest in the site's financial performance. It's not easy, but it's still part of your role to present the unbiased, impartial data that makes up your test analysis, along with the data-driven recommendations for improvements.

It's not your responsibility to make the go/no-go decision, but it is up to you to ensure that the relevant stakeholders and decision-makers have the full data set in front of them when they make the decision. They may choose to implement the new feature anyway, taking into account that it will need to be fixed with follow-up changes and tweaks once it's gone live. It's a healthy compromise, providing that they can pull two developers and a designer away from the next item on their roadmap to do retrospective fixes on the new selector. Alternatively, they may postpone the deployment and use your test data to address the conversion drops that you've shared. How are the conversion drop and the engagement data connected? Is the selector providing valid and accurate recommendations to users? Does the data show that they enter their car colour and their driving style, but then go to the search function when they reach a question about their engine size? Is the sequence of questions optimal? Make sure that you can present these kinds of recommendations - it shows the value of testing, as your stakeholders would not be able to identify these insights from an immediate implementation.

So - why not just switch it on? Here are four good reasons to share with your stakeholders:

* Test data will give you a comparison of whole-site behaviour - not just 'how many people engaged with the new feature?' but also 'what happens to those people who clicked?' and 'how do they compare with users who don't have the feature?'

* Testing will also tell you about the financial impact of the new feature (good for return-on-investment calculations, which are tricky with seasonality and other factors to consider)

* Testing has the key benefit that you can switch it off - at short notice, and at any time. If the data shows that the test recipe is badly losing money then you identify this, and after a discussion with any key stakeholders, you can pull the plug within minutes. And you can end the test at any time - you don't have to wait until the next IT deployment window to undeploy the new feature.

* Testing will give you useful data quickly - within days you'll see how it's performing; within weeks you'll have a clear picture.

Similar posts I've written about online testing

Getting an online testing program off the ground
Building Momentum in Online testing
How many of your tests win?

Web Optimisation, Maths and Puzzles

Header tag

Sunday, 24 November 2024

Testing versus Implementing - why not just switch it on?

No comments:

Post a Comment