Experimenting to Test a Hypothesis
After my previous post on reporting, analysing, forecasting and testing, I thought I'd look in more detail at testing. Not the how-to-do-it, although I'll probably cover that in a later post, but how to take a test and a set of test results and use them to drive recommendations for action. The action might be 'do this to improve results' or it might be 'test this next'.
As I've mentioned before, I have a scientific background, so I have a strong desire to do tests scientifically, logically and in an ordered way. This is how science develops - with repeatable tests that drive theories, hypotheses and understanding. However, in science (by which I mean physics, chemistry and biology), most of the experiments are with quantitative measurements, while in an online environment (on a website, for example), most of the variables are qualitative. This may make it harder to develop theories and conclusions, but it's not impossible - it just requires more thought before the testing begins!
Quantitative data is data that comes in quantities - 100 grams, 30 centimetres, 25 degrees Celsius, 23 seconds, 100 visitors, 23 page views, and so on. Qualitative data is data that describes the quality of a variable - what colour is it, what shape is it, is it a picture of a man or a woman, is the text in capitals, is the text bold? Qualitative data is usually described with words, instead of numbers. This doesn't make the tests any less scientific (by which I mean testable and repeatable) it just means that interpreting the data and developing theories and conclusions is a little trickier.
For example, experiments with a simple pendulum will produce a series of results. Varying the length of the pendulum string leads to a change in the time it takes to complete a full swing. One conclusion from this test would be: "As the string gets longer, the pendulum takes longer to run." And a hypothesis would add, "Because the pendulum has to travel further per swing."
Online, however, test results are more likely to be quantitative. In my previous post, I explained how my test results were as follows:
Red Triangle = 500 points per day
Green Circle = 300 points per day
Blue Square = 200 points per day
Green Circle = 300 points per day
Blue Square = 200 points per day
There's no trending possible here - circles don't have a quantity connected to them, nor a measurable quantity that can be compared to squares or triangles. This doesn't mean that they can't be compared - they certainly can. As I said, though, they do need to be compared with care! In my test, I've combined two quantitative variables - colour and shape - and this has clouded the results completely and made it very difficult to draw any useful conclusions. I need to be more methodical in my tests, and start to isolate one of the variables (either shape or colour) to determine which combination is better. Then I can develop a hypothesis - why is this better than that, and move from testing to optimising and improving performance.
Producing a table of the results from the online experiments shows the gaps that need to be filled by testing - it's possible that not all the gaps will need to be filled in, but certainly more of them do!
Numerical results are points per day
COLOUR | Red | Green | Blue | Yellow |
SHAPE | ||||
Triangle | 500 | |||
Circle | 300 | |||
Square | 200 |
Now there's currently no trend, but by carrying out tests to fill in some of the gaps, it becomes possible to identify trends, and then theories.
Numerical results are points per day
COLOUR | Red | Green | Blue | Yellow |
SHAPE | ||||
Triangle | 500 | 399 | ||
Circle | 409 | 300 | 553 | |
Square | 204 | 200 |
Having carried out four further tests, it now becomes possible to draw the following conclusions:
1. Triangle is the best shape for Red and Blue, and based on the results it appears that Triangle is better than Circle is better than Square.
2. For the colours, it looks as if Red and Yellow are the best.
3. The results show that for Circle, Yellow did better than Red and Green, and further testing with Yellow triangles is recommended.
3. The results show that for Circle, Yellow did better than Red and Green, and further testing with Yellow triangles is recommended.
I know this is extremely over-simplified, but it demonstrates how results and theories can be obtained from qualitative testing. Put another way, it is possible to compare apples and oranges, providing you test them in a logical and ordered way. The trickier bit comes from developing theories as to why the results are the way they are. For example, do Triangles do better because visitors like the pointed shape? Does it match with the website's general branding? Why does the square perform lower than the other shapes? Does its shape fit in to the page too comfortably and not stand out? You'll have to translate this into the language of your website, and again, this translation into real life will be trickier too. You'll really need to take care to make sure that your tests are aiming to fill gaps in results tables, or instead of just being random guesses. Better still, look at the results and look at the likely areas which will give improvements.
It's hard, for sure: with quantitative data, if the results show that making the pendulum longer increases the time it takes for one swing, then yes, making the pendulum even longer will make the time for one swing even longer too. However, changing from Green to Red might increase the results by 100 points per day, but that doesn't lead to any immediate recommendation, unless you include, "Make it more red."
If you started with a hypothesis, "Colours that contrast with our general background colours will do better" and your results support this, then yes, an even more clashing colour might do even better, and that's an avenue for further testing. This is where testing becomes optimising - not just 'what were the results?', but 'what do the results tell us about what was best, and how can we improve even further?'.
Hi David, I really enjoyed reading your post and totally agree with your approach.
ReplyDeleteYour earlier statements summed up my thoughts. "I have a scientific background, so I have a strong desire to do tests scientifically, logically and in an ordered way. This is how science develops - with repeatable tests that drive theories, hypotheses and understanding...This may make it harder to develop theories and conclusions, but it's not impossible - it just requires more thought before the testing begins!"
I have a BSc and MSc in Psychology and was therefore trained to utilise the scientific method to arrive my conclusions. Applying this to web analytics isn't always welcomed and often I feel like I'm the only person that is terrified with the methodologies being used and the quality of the data that we're 'analysing' to later inform others so that they can make the right decisions.
I'm hoping that we can chat some more and share ideas as it pertains to web analytics as a science. Apparently, we like the same music too, ha!
Hi Renaldo
ReplyDeleteThanks for your comment - I agree it's not easy to use scientific method, especially when there are others who take a much more hit-and-miss approach to "testing". Keep reading, I'm sure there's plenty more to talk about with applying scientific method to web analytics!