uyhjjddddddddddd Web Optimisation, Maths and Puzzles: web analytics

Header tag

Showing posts with label web analytics. Show all posts
Showing posts with label web analytics. Show all posts

Monday, 18 November 2024

Designing Personas for Design Prototypes

Part of my job is validating (i.e. testing and confirming) new designs for the website I work on.  We A/B test the current page against a new page, and confirm (or otherwise) that the new version is indeed better than what we have now.  It's often a last-stop measure before the new design is implemented globally, although it's not always a go/no-go decision.

The new design has gone through various other testing and validation first - a team of qualified user experience designers (UX)  and user interface designers (UI) will have decided how they want to improve the current experience.  They will have undertaken various trials with their designs, and will have built prototypes that will have been shown to user researchers; one of the key parts of the design process, somewhere near the beginning, is the development of user personas.

A persona in this context is a character that forms a 'typical user', who designers and product teams can keep in mind while they're discussing their new design.  They can point to Jane Doe and say, "Jane would like this," or, "Jane would probably click on this, because Jane is an expert user."

I sometimes play Chess in a similar way, when I play solo Chess or when I'm trying to analyze a game I'm playing.  I make a move, and then decide what my opponent would play.  I did this a lot when I was a beginner, learning to play (about 40 years ago) - if I move this piece, then he'll move that piece, and I'll move this piece, and I'll checkmate him in two moves!  This was exactly the thought process I would go through - making the best moves for me, and then guessing my opponent's next move.


It rarely worked out that way, though, when I played a real game.  Instead, my actual opponent would see my plans, make a clever move of his own and capture my key piece before I got chance to move it within range of his King.


Underestimating (or, to quote a phrase, misunderestimating) my opponent's thoughts and plans is a problem that's inherent with playing skill and strategy games like Chess.  In my head, my opponent can only play as well as I can. 

However, when I play solo, I can make as many moves as I like, but both sides can do whatever I like, and I can win because I constructed my opponent to follow the perfect sequence of moves to let me win.  And I can even fool myself into believing that I won because I had the better ideas and the best strategy.

And this is a common pitfall among Persona Designers (I've written a whole series on the pitfalls of A/B testing).  They impose too much of their own character onto their persona, and suddenly they don't have a persona, they have a puppet.

"Jane Doe is clever enough to scroll through the product specifications to find the compelling content that will answer all her questions."

"Joe Bloggs is a novice in buying jewellery for his wife, so he'll like all these pretty pictures of diamonds."

"John Doe is a novice buyer who wants a new phone and needs to read all this wonderful content that we've spent months writing and crafting."

This is something similar to the Texas Sharpshooter Fallacy (shooting bullets at the side of a barn, then painting the target around them to make the bullet holes look like they hit it).  That's all well and good, until you realize that the real customers who will spend real money purchasing items from our websites, have a very real target that's not determined by where we shoot our bullets.  We might even know the demographics of our customers, but even that doesn't mean we know what (or how) they think.  We certainly can't imbue our personas with characters and hold on to them as firmly as we do in the face of actual customer buying data that shows a different picture.  So what do we do?



"When the facts change, I change my mind. What do you do, sir?"
Paul Samuelson, Economist,1915-2009


Friday, 17 May 2024

Multi-Armed Bandit Testing

 I have worked in A/B testing for over 12 years, and blogged about it extensively.  I've covered how to set up a hypothesis, how to test iteratively and even summarized the basics of A/B testing.  I ran my first A/B test on my own website (long since deleted and now only in pieces on a local hard-drive) about 14 years ago.  However, it has taken me this long to actually look into other ways of running online A/B tests apart from the equal 50-50 split that we all know and love.

My recent research led me to discover multi-armed bandit testing, which sounds amazing, confusing and possibly risky (don't bandits wear black eye-masks and operate outside the law??). 

What is multi-armed bandit testing?

The term multi-armed bandit comes from a mathematical problem, which can be phrased like this:

A gambler must choose between multiple slot machines, or "one-armed bandits", each which has a different, unknown, likelihood of winning. The aim is to find the best or most profitable outcome by a series of choices. At the beginning of the experiment, when odds and payouts are unknown, the gambler must try each one-armed bandit to measure their payout rate, and then find a strategy to maximize winnings.  


Over time, this will mean putting more money into the machine(s) which provide the best return.

Hence, the multiple one-armed bandits make this the “multi-armed bandit problem,” from which we derive multi-armed bandit testing.

The solution - to put more money into the machine which returns the best prizes most often - translates to online testing:, the testing platform dynamically changes the allocation of new test visitors to the recipes which are showing the best performance so far.  Normally, traffic is allocated randomly between the recipes, but with multi-armed bandit testing traffic is skewed towards the winning recipe(s).  Instead of the normal 50-50 split (or 25-25-25-25, or whichever), the traffic splits on a daily (or by visit) day.  

We see two phases of traffic distribution while the test is running:  initially, we have the 'exploration' phase, where the platform tests and learns, measuring which recipe(s) are providing the best performance (insert your KPI here).  After a potential winner becomes apparent, the percentage of traffic to that recipe starts to increase, while the losers see less and less traffic.  Eventually, the winner will see the vast majority of traffic - although the platform will continue to send a very small proportion of traffic to the losers, to continue to validate its measurements, and this is the 'exploitation' phase.

The graph for the traffic distribution over time may look something like this:


...where Recipe B is the winner.

So, why do a multi-armed bandit test instead of a normal A/B test?

If you need to test, learn and implement in a short period of time, then multi-armed may be the way forwards.  For example, if marketing want to know which of two or three banners should accompany the current sales campaign (back to school; Labour Day; holiday weekend), you aren't going to have time to run the test, analyze the results and push the winner.  The campaign ended while you were tinkering with your spreadsheets.  With multi-armed bandit, the platform identifies the best recipes while the test is running, and implements it while the campaign is still active.  When the campaign has ended, you will have maximized your sales performance by showing the winner while the campaign was active.

Friday, 30 June 2023

Goals (and why I haven't posted recently)

I'll probably blog sometime soon about goals, objectives, strategies and measures.  They're important in business, and useful to have in life generally.  For now, though, I'll have to explain why I haven't blogged much recently at all:  I've found a new (old) hobby:  constructing Airfix models.  I started with Airfix models when I was about 10 or 11 years old - old enough to be patient to wait for the glue to dry, and careful enough to plan how to construct each model.  I had a second wave of interest in my late teens and early 20s, and more recently earlier this year (courtesy of my 11-year-old, now 12-year-old son).

So this is what's been filling my time - building with my son.

Here's my first solo-ish project for 25 years:


  

The set is the Airfix 25 pdr Field Gun with Quad - one that I bought and built during my time at university.  I enjoyed the set, mostly because of the various figures that come with the set.  I've not painted any of my sets in snow camouflage before - and you'll soon see that I'm not a stickler for historical accuracy: I paint what I like!

   

I identified this figure as a troop commander (his flat cap compares with the helmets that the rest of the troop are wearing).  The Quad truck has a gap in the roof, and I decided I was going to stand the commander in the vehicle, peering through the roof.  Yes, he's a very easy target standing there like that, but I figure - why not? 

  

I drew out the overall scene on a piece of wooden board, sketching the position of the vehicle, trailer and gun, and the key figures.  We also have some injured casualties in our collection, and they featured too.  The trees were obtained cheaply from Amazon, and they are cheap, low-quality and quite small for 1/72 scale.  I used scenic roll (green) and white spray paint (generic matt paint) to deliver the snow.  The spray paint was cheap and it didn't spray evenly, but that worked to my advantage to give patchy but heavy coverage.




 

 

  
The final diorama included a Metcalfe model pillbox (the square version), some additional bushes (not shown here) and a good complement of trees.  I added some crater marks (but no depth to the scene) to explain the casualties, and then added some medics too (they were trickier).

Next?  A village scene, with a pair of Tigers ploughing through the remains of a continental village (somewhere).  As ever, it's all about the modelling, and has very little to do with historical accuracy!























Sunday, 30 April 2023

Personalization, Segmentation or Targeting

Following all my recent posts on targeting (or personalization), I was discussing website content changes with a colleague. I was explaining how we could test some form of interactive, real-time changes on our site.  His comments were that this wasn't real 1-to-1 personalization and what I was actually doing was just segmentation and content retargeting.  This started me thinking, and so I'd like to share my thoughts on 1-to-1 targeting is possible, easy and worth the effort.  Or should we be satisfied with segmentation and retargeting?


1-to-1 targeting requires the ability to show any content to any user. It probably needs a hige repository of content that can be accessed to show content that isn't shown to other users, but which is deemed optimal for a particular user.

1. How do you decide which type of user this particular user should be classed as?  
2.  How do you determine which content to show this particular user (or type of user)?
3.  When the targeting doesn't give great results, how can you tell if the problem is with 1. or 2.?

And, as a follow-up question, why is "targeted" content drawn from a library held in higher esteem than retargeting existing content? Is it better because it's so difficult to set up?

Content retargeting - moving existing content on the page - does not require new content, but "isn't real 1-to-1 targeting."  This is true, but I would argue that the difference - mathematically at least - is negligible.  The huge library of targeted content isn't going to be able to match the potential combinations of content that can be achieved just by flipping page content around to promote a particular group of products.

In previous examples on targeting, I've looked at having four product categories that can be targeted.

How many combinations are there for the four products A, B, C, D?

4 * 3 * 2 * 1 = 24

There are four options for the first placement, leaving three for the second placement, two options for the third and only one left for the final place.

This is a relatively simple example - most websites have more than just four products or product categories in their catalogue (even Apple, with its limited product range, has more than four).

Let's jump up to six products:
6 * 5 * 4 * 3 * 2 * 1 = 720.

At this point, retargeting is going to start scaling far more easily than 1-to-1 personalization. 

Admittedly, it's highly unlikely that all 720 combinations are going to be used and shown with equal probability - we will probably see maybe 6-10 combinations that are shown most often, as users visit just one or two product categories and identify themselves as menswear, casual clothes, or womenswear customers.  The remaining three or four categories aren't relevant to these customers, and so we don't retarget hat content.  I mean: if a user is visiting menswear and men's shoes, then they aren't going to be interested in womenswear and casual clothing, so the sequence of those categories is going to be irrelevant and unchanged.

So, we can group users into one of 720 "segments", not based on how we segment them, but how they segment themselves.  This leads to a pseudo-bespoke browsing experience (it isn't 1-to-1, but the numbers are high enough for it to be indistinguishable) that doesn't require the overhead of a huge library of product content waiting to be accessed.

When does the difference between true personalization and segmented retargeting become indistinguishable?  Are we chasing true 1-to-1 personalization when it isn't even beneficial to the customers' experience?

I would say that it's when the number of combinations of retargeted content becomes so large that users are seeing a targeted experience each time they come to the page.  Or, when the number of combinations is greater than the number of users who visit the page.  Personalization is usually perceived - and presented - as the holy grail of Web experience, but in my view it's unnecessary, unattainable and frequently unlikely to actually get off the drawing board. Why not try something that could give actual results, provide improved customer experience and could be set up this side of Christmas?




Tuesday, 28 March 2023

Non-zero Traffic

Barely a month ago, I wrote about a challenge I was having tracking the traffic on this blog.  I had identified that I wasn't seeing any visitors on mobile phones.  Not one.

I'd taken some steps to fix this, and I had been able to track me using my own phone, but only from organic search.  I was still not tracking actual traffic.

Then I found an article that explains how to connect Blogger to Google Analytics 4 (and this has become justification for me to move to GA4).  

All I had to do was to enter my GA4 ID number into the Blogger setting for tracking... and that was it.  It took me weeks to track down the solution, but since then, I've been tracking mobile traffic:

At the time of writing, I've had the fix live for three weeks, and this is how it's looking compared to the three weeks prior to the fix.

I've carried out a number of checks to make sure the data is valid: 

are the mobile and desktop numbers different (or am I double-counting users?)?

are the desktop numbers even numbers (another indication of double counting)?

are the mobile numbers cannibalising the desktop numbers, or are they additional (and thankfully they look good).

In all cases, the data look good, showing clear and distinct differences between the mobile and desktop traffic, but I'm still glad I was able to validate the data accuracy. 

Other articles I've written on Website Analytics that you may find relevant:

Web Analytics - Gathering Requirements from Stakeholders

Tuesday, 21 March 2023

Why Personalization Programs Struggle

So why aren’t we living in a world of perfect personalization? We've been hearing for a while that it'll be the next big thing, so why isn't it happening?

Because it’s hard.  There's just too much to consider, especially if you're after the ultimate goal of 1-to-1 personalization.


In my experience, there are three areas where personalization strategies come completely unstuck.  The first is in the data capture, the second is the classification and design of ‘personas’, and the third is in the visual design.

1. Data capture:  what data can you access?

Search keywords?
PPC campaign information?
Marketing campaign engagement?
Browsing history?
Purchase history?
Can you get geographic or demographic information?
Surely you can’t form a 1x1 relationship between each individual user and their experience? 
Previous purchaser?  And are you going to try and sell them another one of what they just bought?
Traffic source:  search/display/social?
What products are they looking at?
What have they added to basket?

2. Classification:  how are you going to decide how to aggregate and categorise all this data?  

Is it a new user?  Return user?

And the biggest crunch:  how are you going to then transfer these classifications to your Content Management System, or to your Targeting engine, so that it knows which category to place User #12345 into.  And that’s just where the fun begins.

And how do you choose the right data?  I'm personally becoming bored of seeing recommendations based on items I've bought:  "You bought this printer... how about this printer?" and "You recently purchased a new pair of shoes... would you like to buy a pair of shoes?" As an industry we seem to lack the sophistication that says, "You bought this printer - would you like to buy some ink for it?" or "You bought these shoes, would you like to buy this polish, or these laces?"

3. Visual Design

For each category or persona that you identify, you will need to have a corresponding version of your site.  For example, you’ll need to have a banner that promotes a particular product category (a holiday in France, the Caribbean, the Mediterranean, the USA); or you may need to have links to content about men’s shoes; women’s shoes; slippers or sports shoes. 

And your site merchandising team now needs to multiply its efforts for its campaigns. 

Previously, they needed one banner for the pre-Christmas campaign; now, they need to produce four, five or more instead.  This comes as they are approaching their busiest period (because that’s when you’ll get more traffic in and want to maximise its performance) and haven’t got time to generate duplicated content just for one banner.

Fortunately, there are ways of minimizing the headaches that you can encounter when you’re trying to get personalization up and running (or keeping it going).

Why not take the existing content, and show it to users in a different order?  Years ago, there was a mantra (with a meme, probably) going around that told us to 'Remember: There is no fold' but I've never subscribed to that view.  Analytics regularly shows us that most users don't scroll down to see our wonderful content lying just below the edge of their monitor (or their phone screen).  So, if you can identify a customer as someone looking for men's shoes, or women's sports shoes, or a 4x4, or a hatchback, or a plasma TV, then why not show that particular product category first (i.e. above the fold, or at least the first thing below it)?



4. Solutions

The flavour du jour in our house is Airfix modelling - building 1/72 or 1/48 scale vehicles and aircraft, so let's use that as an example, and visit  one of the largest online modelling stores in the UK, Wonderland Models.

Their homepage has a very large leading banner, which rotates like a carousel around five different images: a branding image; radio controlled cars; toy animals and figures; toys and playsets; and plastic model kits.  The opportunity here is to target users (either return visitors, which is easier, or new users, which is trickier) and show them the banner which is most relevant to them. 

The Wonderland Models homepage.  The black line is the fold on my desktop.

How do you select which banner?  By using the data that users are sharing with you - their previous visits, items they've browsed (or added to cart), or what they're looking for in your site search... and so on.  Here, the question of targeted content is simpler - show them the existing banner which closest matches their needs - but the data is trickier.  However, the banners and categories will help you determine the data categorization that you need to - you'll probably find this in your site architecture.

However, here's the bonus:  when you've classified (or segmented) your user, you can use this content again... lower down on the same page.  Most sites duplicate their links, or have multiple links around similar themes, and Wonderland Models is no exception. Here, the secondary categories are Radio Control; Models and Kits; Toys and Collectables; Paints, Tools and Materials; Model Railways and Sale.  These overlap with the banner categories, and with a bit of tweaking, the same data source could be used to drive targeting in both segments. 

As I covered in a previous blog about targeting the sequence of online banners, the win here is that with six categories (and a large part of the web page being targeted), there are thirty different combinations for just the first two slots, with six options for the first position, and five for the second.  This will be useful as the content is long and requires considerable scrolling.

The second and third folds on Wonderland Models.  The black lines show the folds.

Most analytics packages have an integration with CMS’s or targeting platforms.  Adobe Analytics has Target, which is its testing and targeting tool.  It's possible to connect the data from Analytics into Target (and I suspect your Adobe support team would be happy to help) and then use this to make an educated guess on which content to show to your visitors.  At the very least, you could run an A/B test.

5. The Challenge

The main reason personalization programs struggle to get going is (and I hate to use this expression, but here goes) that they aren't agile enough.  At a time when ecommerce is starting to use the product model and forming agile teams, it seems like personalization is often stuck in a waterfall approach.  There's no plan to form a minimum viable product, and try small steps - instead, it's wholesale all-in build-the-monolith, which takes months, then suffers a "funding reprioritization" since the program has nothing to show for its money so far...  this makes it even harder to gain traction (and funding) next time around.

6. The Start

So, don't be afraid to start small.  If you're resequencing the existing content on your home page, and you have three pieces of content, then there are six different ways that the content can be shown.  Without getting into the maths, there's ABC, ACB, BAC, BCA, CAB and CBA.  And you've already created six segments for six personas.  Or at least you've started, and that's what matters.  I've mentioned in a previous article about personalization and sequencing that if you can add in more content into your 'content bank' then the number of variations you can show increases exponentially.  So if you can show the value of resequencing what you already have, then you are in a stronger position to ask for additional content.  Engaging with an already-overloaded merchandising team is going to slow you down and frustrate them, so only work with them when you have something up-and-running to demonstrate.

Remember - start small, build up your MVP and only bring in stakeholders when you need to.  If you want to travel far, travel together, but if you want to travel quickly, travel light!





Saturday, 25 February 2023

Zero Traffic

I've mentioned before that it's always a concern when any one of your success metrics is showing as zero. It suggests that any part of your tracking, calculation or reporting is flawed, and there's no diagnostic information on why.

I have had an ongoing tracking problem with this blog, but hadn't realised until several weeks ago.  I use Google Analytics to track traffic, and the tag is included in one of the right-column elements, after the article list and some of the smaller images.  All good, lots of traffic coming in on a weekly and monthly basis.

Except, as I realised last November, none of my posts from 2021 or 2022 were showing any traffic.  Zero.

I was getting plenty of traffic for my older posts (some of them even rank on the first page of Google for the right search terms) and this was disguising the issue.  Overall traffic was flat year-on-year, despite me keeping up a steady flow of new articles each month.  And then I discovered two gaps in my data:

1. Zero traffic on mobile phones
2.  Zero traffic from social media

Which is obvious in retrospect, since I share most of my posts on Facebook, and my friends comment on my shares.  

In order to tackle these two issues, I've taken a number of steps (not all of which have helped).

a. I've moved the tag from an element in the right column to the middle column, under the content of the post.  The right column doesn't load in mobile devices, due to thr responsive nature of Blogger, so the tag never loaded.  And hence I never saw any of my mobile traffic.

This has worked: I've tested the tag on my own mobile phone and I can see my own visit.  Yay! An increase from zero to one is an infinite increase, and it means the tag is working.

b. It turns out that Facebook's own in-app browser doesn't fire the tracking tag.  At all.  I am in the process of adding the Facebook user agent to my code, and in order to do this, have upgraded to Google Tag Manager.  I'm still not seeing Facebook-referred traffic, but it's an improvement.

And I'm looking at moving to a different platform for my blog.  I've had this one for over 10 years, and it pre-dates my Facebook account.  Maybe it's time for a change?

Other articles I've written on Website Analytics that you may find relevant:

Web Analytics - Gathering Requirements from Stakeholders

Monday, 14 November 2022

How many of your tests win?

 As November heads towards December, and the end of the calendar year approaches, we start the season of Annual Reviews.  It's time to identify, classify and quantify our successes and failures opportunities from 2022, and to look forward to 2023.  For a testing program, this usually involves the number of tests we've run, and how many recipes were involved; how much money we made and how many of our tests were winners.

If I ask you, I don't imagine you'd tell me, but consider for a moment:  how many of your tests typically win?  How many won this year?  Was it 50%?  Was it 75%?  Was it 90%?  And how does this reflect on your team's performance?

50% or less

It's probably best to frame this as 'avoiding revenue loss'.  Your company tested a new idea, and you prevented them from implementing it, thereby saving your company from losing a (potentially quantifiable) sum of money.  You were, I guess, trying some new ideas, and hopefully pushed the envelope - in the wrong direction, but it was probably worth a try.  Or maybe this shows that your business instincts are usually correct - you're only testing the edge cases.

Around 75%

If 75% of your tests are winning, then you're in a good position and probably able to start picking and choosing the tests that are implemented by your company.  You'll have happy stakeholders who can see the clear incremental revenue that you're providing, and who can see that they're having good ideas.

90% or more

If you're in this apparently enviable position, you are quite probably running tests that you shouldn't be.  You're probably providing an insurance policy for some very solid changes to your website; you're running tests that have such strong analytical support, clear user research or customer feedback behind them that they're just straightforward changes that should be made.  Either that, or your stakeholders are very lucky, or have very good intuition about the website.  No, seriously ;-)

Your win rate will be determined by the level of risk or innovation that your company are prepared to put into their tests.  Are you testing small changes, well-backed by clear analytics?  Should you be?  Or are you testing off-the-wall, game-changing, future-state, cutting edge designs that could revolutionise the online experience? 

I've said before that your test recipes should be significantly different from the current state - different enough to be easy to distinguish from control, and to give you a meaningful delta.  That's not to say that small changes are 'bad', but if you get a winner, it will probably take longer to see it.

Another thought:  the win rate is determined by the quality of the test ideas, and how adventurous the ideas are, and therefore the win rate is a measure of the teams who are driving the test ideas.  If your testing team is focused on test ideas and has strengths in web analytics and customer experience metrics, then your team will probably have a high win rate.  Conversely, if your team is responsible for the execution of test ideas which are produced by other teams, then a measure of test quality will be on execution, test timing, and quantity of the tests you run.  You can't attribute the test win rate (high or low) to a team who develop tests; in fact, the quality of the code is a much better KPI.

What is the optimal test win rate?  I'm not sure that there is one, but it will certainly reflect the character of your test program more than its performance. 

Is there a better metric to look at?   I would suggest "learning rate":  how many of your tests taught you something? How many of them had a strong, clearly-stated hypothesis that was able to drive your analysis of your test (winner or loser) and lead you to learn something about your website, your visitors, or both?  Did you learn something that you couldn't have identified through web analytics and path analysis?  Or did you just say, "It won", or "It lost" and leave it there?  Was the test recipe so complicated, or contain so many changes, that isolating variables and learning something was almost completely impossible?

Whatever you choose, make sure (as we do with our test analysis) that the metric matches the purpose, because 'what gets measured gets done'.

Similar posts I've written about online testing

Getting an online testing program off the ground
Building Momentum in Online testing
Testing vs Implementing Directly


Thursday, 24 June 2021

How long should I run my test for?

 A question I've been facing more frequently recently is "How long can you run this test for?", and its close neighbour "Could you have run it for longer?"

Different testing programs have different requirements:  in fact, different tests have different requirements.  The test flight of the helicopter Ingenuity on Mars lasted 39.1 seconds, straight up and down.  The Wright Brothers' first flight lasted 12 seconds, and covered 120 feet.  Which was the more informative test?  Which should have run longer?

There are various ideas around testing, but the main principle is this:  test for long enough to get enough data to prove or disprove your hypothesis.  If your hypothesis is weak, you may never get enough data.  If you're looking for a straightforward winner/loser, then make sure you understand the concept of confidence and significance.

What is enough data?  It could be 100 orders.  It could be clicks on a banner : the first test recipe to reach 100 clicks - or 1,000, or 10,000 - is the winner (assuming it has a large enough lead over the other recipes). 

An important limitation to consider is this:  what happens if your test recipe is losing?  Losing money; losing leads; losing quotes; losing video views.  Can you keep running a test just to get enough data to show why it's losing?  Testing suddenly becomes an expensive business, when each extra day is costing you revenue.   One of the key advantages of testing over 'launch it and see' is the ability to switch the test off if it loses; how much of that advantage do you want to give up just to get more data on your test recipe?

Maybe your test recipe started badly.  After all, many do:  the change of experience from the normal site design to your new, all-improved, management-funded, executive-endorsed design is going to come as a shock to your loyal customers, and it's no surprise when your test recipe takes a nose-dive in performance for a few days.  Or weeks.  But how long can you give your design before you have to admit that it's not just the shock of the new design, (sometimes called 'confidence sickness') but that there are aspects of the new design that need to be changed before it will reach parity with your current site?  A week?  Two weeks?  A month?  Looking at data over time will help here.  How was performance in week 1?  Week 2?  Week 3?  It's possible for a test to recover, but if the initial drop was severe, then you may never recover the overall picture, but if you can find that the fourth week was actually flat (for new and return visitors) then you've found the point where users have adjusted to your new design.

If, however, the weekly gaps are widening, or staying the same, then it's time to pack up and call it a day.

Let's not forget that you probably have other tests in your pipeline which are waiting for the traffic that you're using on your test.  How long can they wait until launch?

So, how long should you run your test for?  As long as possible to get the data you need, and maybe longer if you can, unless it's
- suffering from confidence sickness (keep it running)
- losing badly, and consistently (unless you're prepared to pay for your test data)
- losing and holding up your testing pipeline

Similar posts I've written about online testing

Getting an online testing program off the ground
Building Momentum in Online testing
How many of your tests win?

Wright Brothers Picture:

"Released to Public: Wilber and Orville Wright with Flyer II at Huffman Prairie, 1904 (NASA GPN-2002-000126)" by pingnews.com is marked with CC PDM 1.0

Sunday, 29 November 2020

Combinations and Permutations

PERMUTATIONS AND COMBINATIONS

After mentioning permutations and combinations in my previous blog post on targeting, I thought it was time to provide a more mathematical treatment of them.  Everybody talks about them as a pair (in the same way as people tend to say 'look and feel', or 'design and technology').  

Let's start with an example:  three banners are to be shown on a website homepage. If we simplify and call the different pictures A, B and C, then one order in which they can be hung is A, B, C and another is A, C, B.

Each of these arrangements is called a permutation of the three pictures (and there are further possible permutations), i.e, a permutation is an ordered arrangement of a number of items.

Suppose, however, that seven banners are available for presenting on the website, and only three of them can be displayed. This time a choice has first to be made. If we call the seven banners A, B, C, D, E, F and G, one possible choice of the three pictures for display is A, B, and C - ignoring the sequence of the banners. Regardless of the order in which they are then hung this group of three is just one choice and is called a combination.

A, B, C
A, C, B
B, A, C
B, C, A
C, A, B
C, B, A

are six different permutations; but only one combination - thus:  a combination is an unordered selection of a number of items from a given set.

In this post,  I will discuss methods for finding the total number of ways of arranging items (permutations) or choosing groups of items (combinations) from a given set. But before we do so it is critical that we're able to distinguish between permutations and combinations.  They are not the same, and the terms shouldn't be used interchangeably.

For example:  a news website has ten news articles on its site, but the home page layout means that only five can be shown, in a vertical column. While they cannot display all ten of the articles, they must choose a group of five. The order in which the site selects the five articles is irrelevant (in this case); the set of five is only one combination. Once they have made the choice, they are then able to place the five articles in various different orders on the display stand. Now the site team are arranging them and each arrangement is a permutation, i.e a particular set of five articles is one combination, but that one combination can be arranged to give several different permutations.

1.  The King's Health is Failing
2.  Peace Treaty Signed!
3.  Life found on Mars!
4. Bungled Theft on the Railway
5. Jack the Ripper
6. Reports of My Death Greatly Exaggerated
7. Lottery Winner Buys Football Team
8.  New 007 is a Woman
9. Crop Circles - The Answer
10. Price of Eggs falls 10%


In each of these examples, decide if the question is asking for a number of permutations, or a number of combinations.

How many arrangements of the letters A, B, C are there?
Arrangements means the sequence is important, so this means permutations.

A team of six members is chosen from a group of eight. How many different
teams can be selected?
The sequence is not important, so this means combinations.

A person can take eight records to a desert island, chosen from his own
selection of one hundred records. How many different sets of records could he choose?
Different sets, again the sequence is not critical, so these are combinations.

The first, second and third prizes for a raffle are awarded by drawing tickets
from a box of five hundred. In how many ways can the prizes be won?
Here, there's a difference between the order (or sequence, or arrangement) of the three prizes, so we're looking at permutations.

Combinations:  the sequence is not important.
Permutations:  the sequence is important.

Other reading you may find interesting:

If you're interested in how to use this to improve your website, I can recommend this article on personalisation and targeting and this one on why personalisation programs struggle (hint: they don't make good use of maths).

I've also written a more practical article on how to use combinations and permutations, looking at Targeting Website Banners.

Alternatively, if you like the maths of combinations and permutations, I can suggest Multiplications Puzzles


Monday, 18 November 2019

Web Analytics: Requirements Gathering

Everybody knows why your company has a website, and everybody tracks the site's KPIs.

Except that this a modern retelling of the story of three blind men who tried to describe an elephant by touch alone, and everyone has a limited and specific view of your website.  Are you tracking orders? Are you tracking revenue? Are you tracking traffic? Organic? Paid? Banner? Affiliate? Or, dare I ask, are you just tracking hits?

This siloed approach can actually work, with each person - or more likely, each team - working towards a larger common goal which can be connected to one of the site's actual KPIs.  After all, more traffic should lead to more orders, in theory.  The real problem arises when people from one team start talking to another about the success of a joint project.  Suddenly, we have an unexpected culture clash and two teams, working within the same business, are speaking virtually different languages.  The words are the same, but the definitions are different, and while everybody is using the same words, they're actually discussing very different concepts.

At this stage, it becomes essential to take a step back and take time to understand what everyone means when they use phrases like,  "KPIs","success metrics", or even "conversion". I mean, everyone knows there's one agreed definition of conversion, right? No?  Add to cart; complete order; complete a quote, or a lead-generation activity - I have seen and heard all of these called 'conversion'.

When it comes to testing, this situation can become amplified, as recipes are typically being proposed or supported by different teams with different aims.  One team's KPIs may be very different from another's.  As the testing lead, it's your responsibility to determine what the aims of the test are, and from them - and nothing else - what the KPIs are.  Yes, you can have more than one KPI, but you must then determine which KPI is actually the most important (or dare I say, "key"), and negotiate these with your stakeholders.

A range of my previous pieces of advice on testing become more critical here, as you'll need to ensure that your test recipes really do test your hypothesis, and that the metrics will test the hypothesis.  And, to avoid any doubt, make sure you actually define your success criteria in terms of basic metrics (visits, visitors, orders, revenue, page views, file downloads), so that everybody is on the same page (literally and metaphorically).


Keep everybody updated on your plans, and keep asking the obvious questions - assume as little as possible and make sure you gather all your stakeholders' ideas and requirements.  What do you want to test? Why? What do you want to measure? Why?

Yes, you might sound like an insistent three-year-old, but it will be worth it in the end!


Wednesday, 6 March 2019

Analysis is Easy, Interpretation Less So

Every time we open a spreadsheet, or start tapping a calculator (yes, I still do), or plot a graph, we start analysing data.  As analysts, it is probably most of what we do all day.  It's not necessarily difficult - we just need to know which data points to analyse, which metrics we divide by each other (do you count exit rate per page view, or per visit?) and we then churn out columns and columns of spreadsheet data.  As online or website analysts, we plot the trends over time, or we compare pages A, B and C, and we write the result (so we do some reporting at the end as well).
Analysis. Apparently.

As business analysts, it's not even like we have complicated formulae for our metrics - we typically divide X by Y to give Z, expressed to two decimal places, or possibly as a percentage.  We're not 
calculating acceleration due to gravity by measuring the period of a pendulum (although it can be done), with square roots, fractions, and square roots of fractions.

Analysis - dare I say it - is easy.

What follows is the interpretation of the data, and this can be a potential minefield, especially when you're presenting to stakeholders. If analysis is easy, then sometimes interpretation can really be difficult.

For example, let's suppose revenue per visit went up by 3.75% in the last month.  This is almost certainly a good thing - unless it went up by 4% in the previous month, and 5% in the same month last year.  And what about the other metrics that we track?  Just because revenue per visit went up, there are other metrics to consider as well.  In fact, in the world of online analysis, we have so many metrics that it's scary - and so accurate interpretation becomes even more important.


Okay, so the average-time-spent-on-page went up by 30 seconds (up from 50 seconds to 1 minute 20).  Is that good?  Is that a lot?  Well, more people scrolled further down the page (is that a good thing - is it content consumption or is it people getting well and truly lost trying to find the 'Next page' button?) and the exit rate went down.  

Are people going back and forth trying to find something you're unintentionally hiding?  Or are they happily consuming your content and reading multiple pages of product blurb (or news articles, or whatever)?  Are you facilitating multiple page consumption (page views per visit is up), or are you sending your website visitors on an online wild goose chase (page views per visit is up)?  Whichever metrics you look at, there's almost always a negative and positive interpretation that you can introduce.


This comes back, in part, to the article I wrote last month - sometimes two KPIs is one too many.  It's unlikely that everything on your site will improve during a test.  If it does, pat yourself on the back, learn and make it even better!  But sometimes - usually - there will be a slight tension between metrics that "improved" (revenue went up), metrics that "worsened" (bounce rate went up) and metrics that are just open to anybody's interpretation (time on page; scroll rate; pages viewed per visit; usage of search; the list goes on).  In these situations, the metrics which are open to interpretation need to be viewed together, so that they tell the same story, viewed from the perspective of the main KPIs.  For example, if your overall revenue figures went down, while time on page went up, and scroll rate went up, then you would propose a causal relationship between the page-level metrics and the revenue data:  people had to search harder for the content, but many couldn't find it so gave up.


On the other hand, if your overall revenue figures went up, and time on page increased and exit rate increased (for example), then you would conclude that a smaller group of people were spending more time on the page, consuming content and then completing their purchase - so the increased time on page is a good thing, although the exit rate needs to be remedied in some way.  The interpretation of the page level data has to be in the light of the overall picture - or certainly with reference to multiple data points.  


I've discussed average time on page before.  A note that I will have to expand on sometime:  we can't track time on page for people who exit the page.  It's just not possible with standard tags. It comes up a lot, and unless we state it, our stakeholders assume that we can track it:  we simply can't.  


So:  analysis is easy, but interpretation is hard and is open to subjective viewpoints.  Our task as experienced, professional analysts is to make sure that our interpretation is in line with the analysis, and is as close to all the data points as possible, so that we tell the right story.

In my next posts in this series, I go on to write about how long to run a test for and explain statistical significance, confidence and when to call a test winner.

Other articles I've written on Website Analytics that you may find relevant:

Web Analytics - Gathering Requirements from Stakeholders