My second huddle at XChange 2013 Berlin looked at what to test, how to set up a testing program and how to get management buy-in. We talked about the best way to get a test program set up, how to achieve critical mass and how to build momentum for an online testing program.
I was intending to revisit some of the topics from my earlier post on creative versus iterative testing, but the discussion (as with my first huddle on yesterday's winner, today's loser) very quickly went off on a tangent and never looked back!
There are a number of issues in either starting or building a testing program - here are a few that we discussed:
Lack of management buy-in
Selling web analytics and reporting is not always easy, especially if you're working in (or with) a company that's largely focused on high-street bricks-and-mortar presence, or if the company is historically telephone or catalogue. Trying to sell the idea of online testing can be very tricky indeed. "Why should we test - we know what's best anyway!" is a common response, but the truth is that intuition is rarely right 100% of the time; here are few counter-arguments that you may (or may not) want to try:
"Would you like to submit your own design to include in the test?"
"Could you suggest some other ideas for improving this banner/button/page?"
"Do you think there is a different way we could improve the page and reach/exceed our sales target?"
Other ways of getting management (and other staff, colleagues and stakeholders) to engage with the test is to ask them to guess which recipe or design will win - and put their names to it. If you can market this well, then very quickly, people will start asking how the test is going, if their design is winning. Better still, if their design is losing, they'll probably want to know why, and might even start (1) interrogating the data and (2) designing a follow-up test.
As we commented during our discussion, it's worth saying that you may need to distinguish between a bad recipe and a good manager. "Yes, you are still a good analyst or manager or designer, it's just that people didn't like your design."
Lack of resource
This could be a lack of IT support, design support or JavaScript developer time. Almost all tests are dependent on some sort of IT and design support (although I have heard of analysts and testers testing their own Photoshop creative). It's difficult - as we'll see below - because without design support, you are restricted in what you can test. However, there are a number of test areas that you can work on which are light on design, are light on code maintenance, and which could potentially show useful (and even positive) test results.
- banner imagery - to include having people or no people; a picture of the product or no product
- banner wording - buy-one-get-one-free, or two-for-one, or 50% off? Or maybe even 'Half price'? Wording will probably require even less design work than imagery, and you (as the tester, or analyst) may even be able to set this one up yourself.
- calls to action - Continue? Add to cart? Add to basket? Select? Make payment? This site has a huge gallery of continue shopping buttons, (when a customer has added an item to basket, and you want to persuade them to keep shopping). There are some suggestions on which may work best - and they don't even change the wording. There are many other things to try - colour; arrow or no arrow; CAPITALS or Initial Capitals?
The advantage of these tests is that they can be carried out on the same area of the same page - once the test code has been inserted by the IT or JavaScript teams, you can set up a series of tests just by changing the creative that is being tried. Many of those in the huddle said that once they had obtained a winner, they would then push that to 100% traffic through the testing software until the next test was ready - further reducing the dependency on IT support.
How to sell flat results
There is nothing worse for an analyst or tester to find out that the test results are flat (there's no significant difference in the performance of the test recipes - all the results were the same). The test has taken months to sell, weeks to design and code, and a few weeks to run, and the results say that there's no difference between the original version (which may have had management backing) and your new analytics-backed version. And what do you get? "You said that online testing would improve our performance by 2%, 5%, 7.5%..."
Actually, the results only appear to say there's no difference... so it's time to do some digging!
Firstly, was the difference between the two test recipes large enough and distinct enough? One member of the huddle quoted the Eisenberg brothers: "If you ask people if they prefer green apples or red apples, you're unlikely to get a difference. If you ask them if they prefer apples or chocolate, you'll see a result."
This is something to consider before the test - are the recipes different enough? It's not always easy to say in advance (!) and there is a greater risk of the test recipe losing if the design is too different, but that's the point - iterating is 'safer' than creating, but does include the possibility that it may go flat. How much risk you're prepared to take may depend on external factors such as how much design resource you can obtain and how important it is to get a non-zero result.
Secondly: analysing flat results will require some concerted data analysis. Overall, the number of orders for the two recipes, and the average order value were the same...
But how many people clicked on the new banner? Or how many people bounced or exited from the test page?
Did you get more people to click on your new call-to-action button - and then those people left at the next page? Why?
Did the banner work better for higher-value customers, who then left on the next page because the item they were actually looking for wasn't featured? Did all visitor segments behave in the same way?
Was there a disconnect between the call to action and the next page? Was the next page really what people would have expected?
Did you offer a 50%-off deal but then not make it clear in the checkout process? It's human nature to study and review a test loss, to accept a win without too much study and to completely write off a flat result, but by applying the same level of rigour to a 'flat' result as to a loss, it's still possible to learn something valuable.
How do you set up a testing program?
We discussed how managers and clients generally prefer to start a testing program in the checkout process - it's a nice, easy, linear funnel with plenty of potential for optimisation, and it's very close to the money. If you improve a checkout page, then the financial metrics will automatically improve as a result.
But how do you test in the product description pages, where visitors browse around before selecting an item? We talked about page purpose: what is the idea of a page? What's the main action that you want a user to take after they have seen this page? Is it to complete a lead generation form? Is it to call the sales telephone line? Is it to 'add to cart'? The success metric is for the page should be the key success metric for the test. You'll need to keep an eye on the end-of-funnel metrics (conversion, order value, and so on) but providing those are flat or trending positively, then you can use the page-purpose metrics to measure the success of your test. If you're tracking an offline conversion (call the sales line, for example) then you'll need to do some extra preparatory work, for example by setting up one telephone line per recipe and then arrange to track the volumes of telephone calls - but it'll make the test result more useful.
Tracking page-purpose success metrics will also enable to you to run tests more quickly. If you can see a definite, confident lift in a page-purpose metrics, while the overall financial metrics are flat or positive, then you can call a winner before you reach confidence in the overall metrics. The further you are from the checkout process (and the final order page), the longer it is likely to take for an uplift in page performance to filter through to the financial results (in terms of testing time), but you can be happy that you are improving your customers' experience.
Documentation
Another valuable way of helping to build a testing program, and enabling it to develop, is to document your tests. When a test is completed, you'll probably be presenting to the management and the stakeholders - this is also a great opportunity to present to the people who contributed to your test: the designers, the developers, IT and so on. This applies especially if the test is a winner!
When the presentation is completed, file the results deck on a network drive, or somewhere which is widely accessible. Start to build up a list of test recipes, results and findings. We discussed if this is a worthwhile exercise - it's time-consuming, laborious and if there's only one analyst working on the test program, it seems unnecessary.
However, this has a number of benefits:
- you can start to iterate on previous tests (winners, losers and flat results), and this means that future tests are more likely to be successful ("We did this three weeks ago and the results were good, less try to make them even better")
- you can avoid repeating tests, which is a waste of time, resource and energy ("We did this two months ago and the results were negative")
- you can start to understand your customers' behaviour and design new tests (based on the data) which are more likely to win. ("This test showed our visitors preferred this... therefore I suspect they will also prefer this...).
It's also useful when and if the team starts to grow (which is a positive result of a growing testing program) as you can share all the previous learnings.
These benefits will help the testing program gain momentum, so that you can start iterating and spend less time repeating yourself. Hopefully, you'll find that you have fewer meetings where you have to sell the idea of testing - you can point back at prior wins and say to the management, "Look, this worked and achieved 3% lift," and, if you're feeling brave, "And look, you said this recipe would win and it was 5% below the control recipe!"
The discussion ran for 90 minutes, and we discussed even more than this... I just wish I'd been able to write it all down. I'd like to thank all the huddle participants, who made this a very interesting and enjoyable huddle!
I was intending to revisit some of the topics from my earlier post on creative versus iterative testing, but the discussion (as with my first huddle on yesterday's winner, today's loser) very quickly went off on a tangent and never looked back!
There are a number of issues in either starting or building a testing program - here are a few that we discussed:
Lack of management buy-in
Selling web analytics and reporting is not always easy, especially if you're working in (or with) a company that's largely focused on high-street bricks-and-mortar presence, or if the company is historically telephone or catalogue. Trying to sell the idea of online testing can be very tricky indeed. "Why should we test - we know what's best anyway!" is a common response, but the truth is that intuition is rarely right 100% of the time; here are few counter-arguments that you may (or may not) want to try:
"Would you like to submit your own design to include in the test?"
"Could you suggest some other ideas for improving this banner/button/page?"
"Do you think there is a different way we could improve the page and reach/exceed our sales target?"
Other ways of getting management (and other staff, colleagues and stakeholders) to engage with the test is to ask them to guess which recipe or design will win - and put their names to it. If you can market this well, then very quickly, people will start asking how the test is going, if their design is winning. Better still, if their design is losing, they'll probably want to know why, and might even start (1) interrogating the data and (2) designing a follow-up test.
As we commented during our discussion, it's worth saying that you may need to distinguish between a bad recipe and a good manager. "Yes, you are still a good analyst or manager or designer, it's just that people didn't like your design."
Lack of resource
This could be a lack of IT support, design support or JavaScript developer time. Almost all tests are dependent on some sort of IT and design support (although I have heard of analysts and testers testing their own Photoshop creative). It's difficult - as we'll see below - because without design support, you are restricted in what you can test. However, there are a number of test areas that you can work on which are light on design, are light on code maintenance, and which could potentially show useful (and even positive) test results.
- banner imagery - to include having people or no people; a picture of the product or no product
- banner wording - buy-one-get-one-free, or two-for-one, or 50% off? Or maybe even 'Half price'? Wording will probably require even less design work than imagery, and you (as the tester, or analyst) may even be able to set this one up yourself.
- calls to action - Continue? Add to cart? Add to basket? Select? Make payment? This site has a huge gallery of continue shopping buttons, (when a customer has added an item to basket, and you want to persuade them to keep shopping). There are some suggestions on which may work best - and they don't even change the wording. There are many other things to try - colour; arrow or no arrow; CAPITALS or Initial Capitals?
The advantage of these tests is that they can be carried out on the same area of the same page - once the test code has been inserted by the IT or JavaScript teams, you can set up a series of tests just by changing the creative that is being tried. Many of those in the huddle said that once they had obtained a winner, they would then push that to 100% traffic through the testing software until the next test was ready - further reducing the dependency on IT support.
How to sell flat results
There is nothing worse for an analyst or tester to find out that the test results are flat (there's no significant difference in the performance of the test recipes - all the results were the same). The test has taken months to sell, weeks to design and code, and a few weeks to run, and the results say that there's no difference between the original version (which may have had management backing) and your new analytics-backed version. And what do you get? "You said that online testing would improve our performance by 2%, 5%, 7.5%..."
Actually, the results only appear to say there's no difference... so it's time to do some digging!
Firstly, was the difference between the two test recipes large enough and distinct enough? One member of the huddle quoted the Eisenberg brothers: "If you ask people if they prefer green apples or red apples, you're unlikely to get a difference. If you ask them if they prefer apples or chocolate, you'll see a result."
This is something to consider before the test - are the recipes different enough? It's not always easy to say in advance (!) and there is a greater risk of the test recipe losing if the design is too different, but that's the point - iterating is 'safer' than creating, but does include the possibility that it may go flat. How much risk you're prepared to take may depend on external factors such as how much design resource you can obtain and how important it is to get a non-zero result.
Secondly: analysing flat results will require some concerted data analysis. Overall, the number of orders for the two recipes, and the average order value were the same...
But how many people clicked on the new banner? Or how many people bounced or exited from the test page?
Did you get more people to click on your new call-to-action button - and then those people left at the next page? Why?
Did the banner work better for higher-value customers, who then left on the next page because the item they were actually looking for wasn't featured? Did all visitor segments behave in the same way?
Was there a disconnect between the call to action and the next page? Was the next page really what people would have expected?
Did you offer a 50%-off deal but then not make it clear in the checkout process? It's human nature to study and review a test loss, to accept a win without too much study and to completely write off a flat result, but by applying the same level of rigour to a 'flat' result as to a loss, it's still possible to learn something valuable.
How do you set up a testing program?
We discussed how managers and clients generally prefer to start a testing program in the checkout process - it's a nice, easy, linear funnel with plenty of potential for optimisation, and it's very close to the money. If you improve a checkout page, then the financial metrics will automatically improve as a result.
But how do you test in the product description pages, where visitors browse around before selecting an item? We talked about page purpose: what is the idea of a page? What's the main action that you want a user to take after they have seen this page? Is it to complete a lead generation form? Is it to call the sales telephone line? Is it to 'add to cart'? The success metric is for the page should be the key success metric for the test. You'll need to keep an eye on the end-of-funnel metrics (conversion, order value, and so on) but providing those are flat or trending positively, then you can use the page-purpose metrics to measure the success of your test. If you're tracking an offline conversion (call the sales line, for example) then you'll need to do some extra preparatory work, for example by setting up one telephone line per recipe and then arrange to track the volumes of telephone calls - but it'll make the test result more useful.
Tracking page-purpose success metrics will also enable to you to run tests more quickly. If you can see a definite, confident lift in a page-purpose metrics, while the overall financial metrics are flat or positive, then you can call a winner before you reach confidence in the overall metrics. The further you are from the checkout process (and the final order page), the longer it is likely to take for an uplift in page performance to filter through to the financial results (in terms of testing time), but you can be happy that you are improving your customers' experience.
Documentation
Another valuable way of helping to build a testing program, and enabling it to develop, is to document your tests. When a test is completed, you'll probably be presenting to the management and the stakeholders - this is also a great opportunity to present to the people who contributed to your test: the designers, the developers, IT and so on. This applies especially if the test is a winner!
When the presentation is completed, file the results deck on a network drive, or somewhere which is widely accessible. Start to build up a list of test recipes, results and findings. We discussed if this is a worthwhile exercise - it's time-consuming, laborious and if there's only one analyst working on the test program, it seems unnecessary.
However, this has a number of benefits:
- you can start to iterate on previous tests (winners, losers and flat results), and this means that future tests are more likely to be successful ("We did this three weeks ago and the results were good, less try to make them even better")
- you can avoid repeating tests, which is a waste of time, resource and energy ("We did this two months ago and the results were negative")
- you can start to understand your customers' behaviour and design new tests (based on the data) which are more likely to win. ("This test showed our visitors preferred this... therefore I suspect they will also prefer this...).
It's also useful when and if the team starts to grow (which is a positive result of a growing testing program) as you can share all the previous learnings.
These benefits will help the testing program gain momentum, so that you can start iterating and spend less time repeating yourself. Hopefully, you'll find that you have fewer meetings where you have to sell the idea of testing - you can point back at prior wins and say to the management, "Look, this worked and achieved 3% lift," and, if you're feeling brave, "And look, you said this recipe would win and it was 5% below the control recipe!"
The discussion ran for 90 minutes, and we discussed even more than this... I just wish I'd been able to write it all down. I'd like to thank all the huddle participants, who made this a very interesting and enjoyable huddle!