Header tag

Sunday, 24 November 2024

Testing versus Implementing - why not just switch it on?

"Why can't we just make a change and see what happens? Why do we have to build an A/B test - it takes too long!  We have a roadmap, a pipeline and a backlog, and we haven't got time."

It's not always easy to articulate why testing is important - especially if your company is making small, iterative, data-backed changes to the site and your tests consistently win (or, worse still, go flat).  The IT team is testing carefully and cautiously, but the time taken to build the test and run it is slowing down everybody's pipelines.  You work with the IT team to build the test (which takes time), it runs (which takes even more time), you analyze the test (why?) and you show that their good idea was indeed a good idea.  Who knew?


Ask an AI what a global IT roadmap looks like...

However, if your IT team is building and deploying something to your website - a new way of identifying a user's delivery address; or a new way of helping users decide which sparkplugs or ink cartridges or running shoes they need - something new, innovative and very different, then I would strongly recommend that you test it with them, even if there is strong evidence for its effectiveness.  Yes, they have carried out user-testing and it's done well.  Yes, their panel loved it.  Even the Head of Global Synergies liked it, and she's a tough one to impress.  Their top designers have spent months in collaboration with the project manager, and their developers have gone through the agile process so many times that they're as flexible as ballet dancers.  They've barely reached the deadline for pre-Christmas implementation, and now is the time to implement it.  It is ready.  However, the Global Integration Leader has said that they must test before they launch, but that's okay as they have allocated just enough time for a pre-launch A/B test, then they'll go live as soon as the test is complete.


Sarah Harries, Head of Global Synergies

Everything hinges on the test launching on time, which it does.  Everybody in the IT team is very excited to see how users engage with the new sparkplug selection tool and - more importantly for everybody else - how much it adds to overall revenue.  (For more on this, remember that clicks aren't really KPIs). 

But the test results come back: you have to report that the test recipe is underperforming at a rate of 6.3% conversion drop.  Engagement looks healthy at 11.7%, but those users are dragging down overall performance.  The page exit rate is lower, but fewer users are going through checkout and completing a purchase.  Even after two full weeks, the data is looking negative.  

Can you really recommend implementing the new feature?  No; but that's not the end of the story.  It's your job to now unpick the data, and turn analysis into insights:  why didn't it win?!

The IT team, understandably, want to implement.  After all, they've spent months building this new selector and the pre-launch data was all positive.  The Head of Global Synergies is asking them why it isn't on the site yet.  Their timeline allowed three weeks for testing and you've spent three weeks testing.  Their unspoken assumption was that testing was a validation of the new design, not a step that might turn out to be a roadblock, and they had not anticipated any need for post-test changes.  It was challenging enough to fit in the test, and besides, the request was to test it.

It's time to interrogate the data.

Moreover, they have identified some positive data points:

*  Engagement is an impressive 11.7%.  Therefore, users love it.
*  The page exit rate is lower, so more people are moving forwards.  That's all that matters for this page:  get users to move forwards towards checkout.
*  The drop in conversion is coming from the pages in the checkout process.  That can't be related to the test, which is in the selector pages.  It must be a checkout problem.

They question the accuracy of the test data, which contradicts all their other data.

* The sample size is too small.
* The test was switched off before it had a chance to recover its 6.3% drop in conversion

They suggest that the whole A/B testing methodology is inaccurate.

* A/B testing is outdated and unreliable.  
* The split between the two groups wasn't 50-50.  There are 2.2% more visitors in A than B.

Maybe they'll comment that the data wasn't analyzed or segmented correctly, and they make some points about this:

* The test data includes users buying other items with their sparkplugs.  These should be filtered out.
* The test data must have included users who didn't see the test experience.
* The data shows that users who browsed on mobile phones only performed at -5.8% on conversion, so they're doing better than desktop users.

Remember:  none of this is personal.  You are, despite your best efforts, criticising a project that they've spent weeks or even months polishing and producing.  Nobody until this point has criticised their work, and in fact everybody has said how good it is.  It's not your fault, your job is to present the data and to provide insights based on it.  As a testing professional, your job is to run and analyse tests, not to be swayed into showing the data in a particular way.

They ran the test at the request of the Global Integration Leader, and burnt three weeks  waiting for the test to complete.  The deadline for implementing the new sparkplug selector is Tuesday, and they can't stop the whole IT roadmap (which is dependent on this first deployment) just because one test showed some negative data.  They would have preferred not to test it at all, but it remains your responsibility to share the test data with other stakeholders in the business, marketing and merchandizing teams, who have a vested interest in the site's financial performance.  It's not easy, but it's still part of your role to present the unbiased, impartial data that makes up your test analysis, along with the data-driven recommendations for improvements.

It's not your responsibility to make the go/no-go decision, but it is up to you to ensure that the relevant stakeholders and decision-makers have the full data set in front of them when they make the decision.  They may choose to implement the new feature anyway, taking into account that it will need to be fixed with follow-up changes and tweaks once it's gone live.  It's a healthy compromise, providing that they can pull two developers and a designer away from the next item on their roadmap to do retrospective fixes on the new selector.  
Alternatively, they may postpone the deployment and use your test data to address the conversion drops that you've shared.  How are the conversion drop and the engagement data connected?  Is the selector providing valid and accurate recommendations to users?  Does the data show that they enter their car colour and their driving style, but then go to the search function when they reach a question about their engine size?  Is the sequence of questions optimal?  Make sure that you can present these kinds of recommendations - it shows the value of testing, as your stakeholders would not be able to identify these insights from an immediate implementation.

So - why not just switch it on?  Here are four good reasons to share with your stakeholders:

* Test data will give you a comparison of whole-site behaviour - not just 'how many people engaged with the new feature?' but also 'what happens to those people who clicked?' and 'how do they compare with users who don't have the feature?'
* Testing will also tell you about  the financial impact of the new feature (good for return-on-investment calculations, which are tricky with seasonality and other factors to consider)
*  Testing has the key benefit that you can switch it off - at short notice, and at any time.  If the data shows that the test recipe is badly losing money then you identify this, and after a discussion with any key stakeholders, you can pull the plug within minutes.  And you can end the test at any time - you don't have to wait until the next IT deployment window to undeploy the new feature. 
* Testing will give you useful data quickly - within days you'll see how it's performing; within weeks you'll have a clear picture.





Monday, 18 November 2024

Designing Personas for Design Prototypes

Part of my job is validating (i.e. testing and confirming) new designs for the website I work on.  We A/B test the current page against a new page, and confirm (or otherwise) that the new version is indeed better than what we have now.  It's often a last-stop measure before the new design is implemented globally, although it's not always a go/no-go decision.

The new design has gone through various other testing and validation first - a team of qualified user experience designers (UX)  and user interface designers (UI) will have decided how they want to improve the current experience.  They will have undertaken various trials with their designs, and will have built prototypes that will have been shown to user researchers; one of the key parts of the design process, somewhere near the beginning, is the development of user personas.

A persona in this context is a character that forms a 'typical user', who designers and product teams can keep in mind while they're discussing their new design.  They can point to Jane Doe and say, "Jane would like this," or, "Jane would probably click on this, because Jane is an expert user."

I sometimes play Chess in a similar way, when I play solo Chess or when I'm trying to analyze a game I'm playing.  I make a move, and then decide what my opponent would play.  I did this a lot when I was a beginner, learning to play (about 40 years ago) - if I move this piece, then he'll move that piece, and I'll move this piece, and I'll checkmate him in two moves!  This was exactly the thought process I would go through - making the best moves for me, and then guessing my opponent's next move.


It rarely worked out that way, though, when I played a real game.  Instead, my actual opponent would see my plans, make a clever move of his own and capture my key piece before I got chance to move it within range of his King.


Underestimating (or, to quote a phrase, misunderestimating) my opponent's thoughts and plans is a problem that's inherent with playing skill and strategy games like Chess.  In my head, my opponent can only play as well as I can. 

However, when I play solo, I can make as many moves as I like, but both sides can do whatever I like, and I can win because I constructed my opponent to follow the perfect sequence of moves to let me win.  And I can even fool myself into believing that I won because I had the better ideas and the best strategy.

And this is a common pitfall among Persona Designers. 

"Jane Doe is clever enough to scroll through the product specifications to find the compelling content that will answer all her questions."

"Joe Bloggs is a novice in buying jewellery for his wife, so he'll like all these pretty pictures of diamonds."

"John Doe is a novice buyer who wants a new phone and needs to read all this wonderful content that we've spent months writing and crafting."

This is something similar to the Texas Sharpshooter Fallacy (shooting bullets at the side of a barn, then painting the target around them to make the bullet holes look like they hit it).  That's all well and good, until you realize that the real customers who will spend real money purchasing items from our websites, have a very real target that's not determined by where we shoot our bullets.  We might even know the demographics of our customers, but even that doesn't mean we know what (or how) they think.  We certainly can't imbue our personas with characters and hold on to them as firmly as we do in the face of actual customer buying data that shows a different picture.  So what do we do?



"When the facts change, I change my mind. What do you do, sir?"
Paul Samuelson, Economist,1915-2009