Web Optimisation, Maths and Puzzles: KPI

Showing posts with label KPI. Show all posts

Monday, 14 November 2022

How many of your tests win?

As November heads towards December, and the end of the calendar year approaches, we start the season of Annual Reviews. It's time to identify, classify and quantify our successes and ~~failures~~ opportunities from 2022, and to look forward to 2023. For a testing program, this usually involves the number of tests we've run, and how many recipes were involved; how much money we made and how many of our tests were winners.

If I ask you, I don't imagine you'd tell me, but consider for a moment: how many of your tests typically win? How many won this year? Was it 50%? Was it 75%? Was it 90%? And how does this reflect on your team's performance?

50% or less

It's probably best to frame this as 'avoiding revenue loss'. Your company tested a new idea, and you prevented them from implementing it, thereby saving your company from losing a (potentially quantifiable) sum of money. You were, I guess, trying some new ideas, and hopefully pushed the envelope - in the wrong direction, but it was probably worth a try. Or maybe this shows that your business instincts are usually correct - you're only testing the edge cases.

Around 75%

If 75% of your tests are winning, then you're in a good position and probably able to start picking and choosing the tests that are implemented by your company. You'll have happy stakeholders who can see the clear incremental revenue that you're providing, and who can see that they're having good ideas.

90% or more

If you're in this apparently enviable position, you are quite probably running tests that you shouldn't be. You're probably providing an insurance policy for some very solid changes to your website; you're running tests that have such strong analytical support, clear user research or customer feedback behind them that they're just straightforward changes that should be made. Either that, or your stakeholders are very lucky, or have very good intuition about the website. No, seriously ;-)

Your win rate will be determined by the level of risk or innovation that your company are prepared to put into their tests. Are you testing small changes, well-backed by clear analytics? Should you be? Or are you testing off-the-wall, game-changing, future-state, cutting edge designs that could revolutionise the online experience?

I've said before that your test recipes should be significantly different from the current state - different enough to be easy to distinguish from control, and to give you a meaningful delta. That's not to say that small changes are 'bad', but if you get a winner, it will probably take longer to see it.

Another thought: the win rate is determined by the quality of the test ideas, and how adventurous the ideas are, and therefore the win rate is a measure of the teams who are driving the test ideas. If your testing team is focused on test ideas and has strengths in web analytics and customer experience metrics, then your team will probably have a high win rate. Conversely, if your team is responsible for the execution of test ideas which are produced by other teams, then a measure of test quality will be on execution, test timing, and quantity of the tests you run. You can't attribute the test win rate (high or low) to a team who develop tests; in fact, the quality of the code is a much better KPI.

What is the optimal test win rate? I'm not sure that there is one, but it will certainly reflect the character of your test program more than its performance.

Is there a better metric to look at? I would suggest "learning rate": how many of your tests taught you something? How many of them had a strong, clearly-stated hypothesis that was able to drive your analysis of your test (winner or loser) and lead you to learn something about your website, your visitors, or both? Did you learn something that you couldn't have identified through web analytics and path analysis? Or did you just say, "It won", or "It lost" and leave it there? Was the test recipe so complicated, or contain so many changes, that isolating variables and learning something was almost completely impossible?

Whatever you choose, make sure (as we do with our test analysis) that the metric matches the purpose, because 'what gets measured gets done'.

Similar posts I've written about online testing

Getting an online testing program off the ground
Building Momentum in Online testing
Testing vs Implementing Directly

Tuesday, 23 February 2021

Knowing Your KPI is Key

I've written in the past about KPIs, and today I find myself sitting at my computer about to re-tell a story about KPIs - with another twist.

Two years ago, almost to the day, I introduced you all to Albert, Britney and Charles, my three fictitious car salespeople. Back in 2019, they were selling hybrid cars, and we had enough KPIs to make sure that each of them was a winner in some way (except Albert. He was our 'control', and he was only there to make the others look good. Sorry, Albert).

Well, two years on, selling cars has gone online. Covid-19 and all that means that sales of cars are now handled remotely - with video views, emails, and Zoom calls - and targets have been realigned as a result. The management team have realised that KPIs need to change in line with the new targets (which makes sense), and there are now a number of performance indicators being tracked.

Here are the results from January 2021 for our three long-standing (or long-suffering) salespeople.

Metric	Albert	Britney	Charles
Zoom sessions	411	225	510
Calls answered	320	243	366
Leads generated	127	77	198
Cars sold	40	59	60
Revenue (£)	201,000	285,000	203,500
Average car value (£)	5025	4830	3391
Conversion (contact to lead)	17.4%	16.5%	22.6%
Conversion (lead to sale)	31.5%	76.6%	30.3%

And again we ask ourselves: who was the best salesperson? And, more important, which of the KPIs is actually the KEY performance indicator?

Albert: had the highest average car value

Britney: had the highest revenue (40% more than Albert or Charles) and by far the highest conversion from lead to sale.

Charles: had the most Zoom sessions; calls answered; leads generated; cars sold and conversion from contact to lead.

Surely Charles won? Except that wages, overheads and shareholder dividends aren't paid with Zoom sessions; bonuses aren't paid in phone calls and pensions aren't paid with actual cars.

The KPI of most businesses (and certainly this one) is revenue - or, more specifically, profit margin. It's very nice to be able to talk about other metrics and to use these to improve the business, but if you're a business and your KPI isn't something related to money, then you're probably not aiming for the right target.

Yes, you can certainly use other metrics to improve the business: for example, Charles desperately needs to learn how to sell higher-value cars. He's extremely productive - even prolific - with the customer contacts, but he's £1400 down per car compared to Britney, and £1600 down per car compared to Albert. Additionally, if Britney learned to improve her sales conversations and Zoom technique so that it was faster and more efficient, her sales volumes would increase. This use of data to drive action is extremely helpful, and this will make your analysis actionable.

So: metrics and KPIs aren't the same thing. Select the KPI that actually matches the business aim (typically margin and revenue) and don't get distracted by lesser KPIs that are actually just calculated ratios. Use all the metrics to improve business performance, but pick your winner based on what really matters to your company.

I have looked at KPIs in some my other articles:

The Importance of Being Earnest with your KPIs
Why Test Recipe KPIs are Vital
Web Analytics and Testing - A summary so far

Monday, 18 November 2019

Web Analytics: Requirements Gathering

Everybody knows why your company has a website, and everybody tracks the site's KPIs.

Except that this a modern retelling of the story of three blind men who tried to describe an elephant by touch alone, and everyone has a limited and specific view of your website. Are you tracking orders? Are you tracking revenue? Are you tracking traffic? Organic? Paid? Banner? Affiliate? Or, dare I ask, are you just tracking hits?

This siloed approach can actually work, with each person - or more likely, each team - working towards a larger common goal which can be connected to one of the site's actual KPIs. After all, more traffic should lead to more orders, in theory. The real problem arises when people from one team start talking to another about the success of a joint project. Suddenly, we have an unexpected culture clash and two teams, working within the same business, are speaking virtually different languages. The words are the same, but the definitions are different, and while everybody is using the same words, they're actually discussing very different concepts.

At this stage, it becomes essential to take a step back and take time to understand what everyone means when they use phrases like, "KPIs","success metrics", or even "conversion". I mean, everyone knows there's one agreed definition of conversion, right? No? Add to cart; complete order; complete a quote, or a lead-generation activity - I have seen and heard all of these called 'conversion'.

When it comes to testing, this situation can become amplified, as recipes are typically being proposed or supported by different teams with different aims. One team's KPIs may be very different from another's. As the testing lead, it's your responsibility to determine what the aims of the test are, and from them - and nothing else - what the KPIs are. Yes, you can have more than one KPI, but you must then determine which KPI is actually the most important (or dare I say, "key"), and negotiate these with your stakeholders.

A range of my previous pieces of advice on testing become more critical here, as you'll need to ensure that your test recipes really do test your hypothesis, and that the metrics will test the hypothesis. And, to avoid any doubt, make sure you actually define your success criteria in terms of basic metrics (visits, visitors, orders, revenue, page views, file downloads), so that everybody is on the same page (literally and metaphorically).

Keep everybody updated on your plans, and keep asking the obvious questions - assume as little as possible and make sure you gather all your stakeholders' ideas and requirements. What do you want to test? Why? What do you want to measure? Why?

Yes, you might sound like an insistent three-year-old, but it will be worth it in the end!

Thursday, 21 February 2019

One KPI too many

Three hypothetical car sales representatives are asked to focus on increasing their sales of hybrid cars for a month. They are a good cross-section of the whole sales team (which is almost 40 sales reps), and they each have their own approach. The sales advisor with the best sales figures for hybrid cars at the end of the month will receive a bonus, so there's a clear incentive to sell well. At the end of the month, the sales representatives get together with management to compare their results and confirm the winner.

Albert

Albert made no real changes to his sales style, confident that his normal sales techniques would be enough to get him through top sales spot.

Albert is, basically, our "control", which the others will be compared against. Albert is a fairly steady member of the team, and his performance is ideal for judging the performance of the other individuals. Albert sold 100 cars, of which 20 were hybrids.

Britney

Britney embraces change well, and when this incentive was introduced, she immediately made significant changes to her sales tactics. Throughout the incentive period, she went to great lengths to highlight the features and benefits of the hybrid cars. In some cases, she missed out on sales because she was pushing the hybrids so enthusiastically.

While she doesn't sell as many cars as Albert, she achieves 90 sales, of which 30 are hybrids.

Charles

Finally, Charles is the team's strongest salesman, and throughout the sales incentive month, he just sells more cars. He does this by generally pushing, chasing and selling harder to all customers, using his experience and sales skills. He doesn't really focus on selling the hybrids in particular.

Consequently, he achieves an enormous 145 sales, which includes 35 hybrid sales.

Let's summarise, and add some more metrics and KPIs (because you can never have too many, apparently...).

	Albert	Britney	Charles
Total car sales	100	90	145
Hybrid car sales	20	30	35
% Hybrid	20%	33.3%	24.1%
Total revenue	$915,000	$911,700	$913,500
Revenue per car	$9,150	$10,130	$6,300

Who did best?

1. Albert achieved the highest revenue, but only sold 20% hybrid cars.
2. Britney achieved 33% hybrid sales, but only sold 90 cars in total. She did, however, achieve the highest revenue per car (largely due to sales of the new, more expensive hybrids).
3. Charles sold 35 hybrids - the most- but only at a rate of 24.1%. He also sold generally cheaper cars (he sold 110 non-hybrid cards, and many of them were either discounted or used cars)

So which Key Performance Indicator is actually Key?

This one is often a commercial decision, based on what's more important to the business targets. Is it the volume of hybrid cars, or the percentage of them? How far could Britney's drop in overall sales be accepted before it is detrimental to overall performance? And how far could Charles's increase in overall sales be overlooked?

Sometimes, your recommendation for implementing an optimisation recipe will run into a similar dilemma. In situations like these, it pays to know which KPI is actually Key! Is it conversion? Is it volumes of PDF downloads, or is it telephone calls, chat sessions, number of pages viewed per visit or is it revenue? And how much latitude is there in calling a winner? In some situations, you won't know until you suddenly realise that your considered recommendation is not getting the warm reception you expected (but you'll start to get a feel for the Key KPIs, even if they're never actually provided by your partners).

Image credits:
Albert:  https://www.victorylaynechevrolet.com/MeetOurDepartments
Britney:  https://windsorstar.com/news/local-news/fcas-best-month-ever-as-overall-canadian-auto-sales-hit-new-record
Charles:  https://lifestyle.clickhole.com/beautiful-this-car-salesman-shaved-1-000-off-the-stic-1825120441

Car: https://www.turbosquid.com/3d-models/3d-model-of-generic-hybrid-car-simple/942292

Monday, 18 June 2018

The Importance of Being Earnest with Your KPIs

It’s World Cup time once again, and a prime opportunity to revisit the importance of having the right KPIs to measure your performance (football team, website, marketing campaign, or whichever). Take a look at these facts and apparent KPIs, taken from a recent World Cup soccer match, and notice how it’s possible to completely avoid what your data is actually telling you.

* One goalkeeper made nine saves during the match, which is three more than any other goalkeeper in the World Cup so far.

* One team had 26 shots in the game – without scoring – which is the most so far in this World Cup, and equals Portugal in their game against England in 2006. The other team had just 13 shots in the game, and only four on target.

* One team had just 33% possession: they had the ball for only 30 minutes out of the 90-minute game

* One team had eight corners; the other managed just one.

A graph may help convey some additional data, and give you a clue as to the game (and the result).

If you look closely, you’ll note that the team in green had four shots on target, while other team only managed three saves.

Hence the most important result in the game – the number of goals scored – gets buried (if you’re not careful) and you have to carry out additional analysis to identify that Mexico won 1-0, scoring in the first half and then holding onto their lead with only 33% possession.

Thursday, 16 March 2017

Average Time Spent on Page

The history of Web analytics tools has left a legacy of metrics that we can obtain "out of the box" even if they are of no practical use, and I would argue that a prime candidate for this category is time spent on page, and its troublesome partner average time spent on page. It's available because it's easy to obtain from tag-fires (or server log files) - it's just the time taken between consecutive page loads. Is it useful? Not by itself, no.

For example, it can't be measured if the visitor exits from the page. If a user doesn't load another page on your site, then there are no further tag-fires, and you don't get a time on page. This means that you have a self-selecting group of people who stayed on your site for at least one more page. It entirely excludes visitors who immediately tell they have the wrong page and then leave. It also, sadly, excludes people who consume all the content and then leave. No net benefit there, then.

Worse still, visitors who immediately realise that they have the wrong page and hit the back button are included. So, is there any value to the metric at all? In most cases, I would argue not, although there can be if handled carefully. For example, there is some potential benefit in monitoring pages which require data entry, such as checkout pages or other forms. In these circumstances, faster is definitely better, and slower suggests unnecessarily complicated or lengthy. For most shopping pages, though, you will need a much clearer view of whether more time is better or worse. In an international journey, four hours on an airliner is very different from three hours in an airport.

I mentioned that time on page is not helpful by itself: it can be more informative in conjunction with other metrics such as exit rate, bounce rate or revenue participation. For example, if a page has a high exit rate and high time on page, then it suggests that only a few people are finding the content helpful and are prepared to work through the page to get what they want - and to move forwards. Remember that you can't draw any conclusions about the people who left - either they found everything they needed and then left, or they gave up quickly (or anything in between).

So, if you use and quote average time on page, then I suggest that you make sure you know what it's telling you and what's missing; that you quote it in conjunction with other relevant metrics, and you have decided in advance if longer = better or longer = worse.

Monday, 9 February 2015

Reviewing Manchester United Performance - Real Life KPIs Part 2

As a few weeks have passed since my last review of Manchester United's performance in this year's Premier League. An overview of the season so far reveals some interesting facts:

Southampton went to third position in mid-January, following their win at Old Trafford. Southampton finished eighth last season, and 14th in the season before that. This is their first season with new manager Ronald Koeman. Perhaps some analysis on his performance is needed, another time perhaps. :-)

Southampton enjoyed their first win in 27 years in the league at Old Trafford on 11 January. Their fifteen previous visits were two draws (1999, 2013) and thirteen wins for Manchester United. Conversely, United had won their last five at home and missed out on the chance for a ninth win in the league – which was their total for home wins in the whole of last season.

So let's take a look at Louis Van Gaal's performance, as at 9 February 2015, and compare it, as usual, with David Moyes (the 'chosen one'), Alex Ferguson (2012-13) and Alex Ferguson (1986-87, his first season).

Horizontal axis - games played
Vertical axis - cumulative points
Red - AF 2012-13
Pink - AF 1986-87
Blue - DM 2013-2014
Green - LVG 2014-15 (ongoing)

The first thing to note is that LVG has improved his performance recently, and is now back above the blue danger line (David Moyes' performance in 2013-14, which is the benchmark for 'this will get you fired').

However, LVG's performance is still a long way below the red line left by Alex Ferguson in his final season, so let's briefly investigate why.

Under LVG, Manchester United have drawn 33% of their league games this season, compared to just 13% for Alex Ferguson's 2012-13 season. This doesn't include the goal-less draw against Cambridge United in the FA Cup, which is a great example of Man Utd not pressing home their apparent advantage (Man Utd won the rematch 3-0 at Old Trafford). Yesterday (as I write), Manchester United scraped a draw against West Ham by playing the 'long-ball game', criticised after the match by West Ham's manager, Sam Allardyce. West Ham are currently eighth in the table, four places behind Man Utd.

Interestingly, Moyes and Van Gaal have an identical win rate of 50%. It might be suggested that Van Gaal's issue is not converting enough draws into wins; this is a slightly better problem to have compared to Moyes' problem, which was not holding on to enough draws and subsequently losing. In football terms, Van Gaal needs to teach his team to more effectively 'park the bus'.

Is Louis Van Gaal safe? According to the statistics alone, yes, he is, for now. He's securing enough draws to keep him above the David Moyes danger line, and he's achieving more wins that Alex Ferguson did in his first season. However, his primary focus must be to start converting draws into wins. I haven't done the full match analysis to determine if that means scoring more or holding on to the lead once he has it - perhaps that will come later.

Is Louis Van Gaal totally safe? That depends on if the staff at Man United think that a marginal improvement on last season's performance is worth the £59.7m spent on Angel Di Maria, £29m on Ander Herrera, and £27m on Luke Shaw (plus others). £120m for a few more draws in the season is probably not seen as good value for money.

Tuesday, 27 January 2015

Pitfalls of Online Optimisation and Testing 1: Are your results really flat?

I've previously covered the trials of starting and maintaining an online optimisation program, and once you've reached a critical mass it seems as if the difficulties are over and it's plain sailing. Each test provides valuable business insights, yields a conversion lift (or points to a future opportunity) and you've reached a virtuous cycle of testing and learning. Except when it doesn't. There are some key pitfalls to avoid, or, having hit them, to conquer.

1. Obtaining flat results (a draw)

2. Too little meaningful data

3. Misunderstanding discrete versus continuous testing

The largest ever score draw in English football was a 5-5 draw between West Bromwich Albion and Manchester United in May 2013. Just last weekend, the same mighty Manchester United were held to a goalless draw by Cambridge United, a team which is two divisions below them in the English league, in an FA Cup match. Are these games the same? Are the two sides really equal? In both games, both teams performed equally, so on face value you would think they are (and perhaps they are; Manchester United are really not having a great season). It's time to consider the underlying data to really extract an unbiased and fuller story of what happened (the Cambridge press recorded it as a great draw, one Manchester-based website saw it slightly differently).

Let's look at the recent match between Cambridge and Manchester United, borrowing a diagram from Cambridge United's official website.

One thing is immediately clear: Cambridge didn't score any goals because they didnt' get a single shot on target. Manchester United, on the other hand, had five shots on target but a further ten that missed - only 33% of shots were heading for the goal. Analysis of the game would probably indicate that these were long-range shots as Cambridge kept Manchester at a 'safe' distance from their goal. Although this game was a goalless draw, it's clear that the two sides have different issues that they need to address if they are to score in the replay next week.

Now let's look at the high-scoring draw between West Brom and Man Utd. Which team was better and which was lucky to get a single point from the game? In each case, it would also be beneficial to analyse how each of the ten goals was scored - that's ten goals (one every nine minutes on average) which is invaluable data compared to the goalless draw.

The image on the right is borrowed from the Guardian's website, and shows the key metrics for the game (I've discussed key metrics in football matches before). What can we conclude?

- it was a close match, with both team seeing similar levels of ball possession.

- West Brom acheived 15 shots in total, compared to just 12 for Man Utd

- If West Brom had been able to improve the quality and accuracy of their goal attempts, they may have won the game.

- For Man Utd, the problem was not the quality of their goal attempts (they had 66% accuracy, compared to just over 50% for West Bromwich) but the quantity of them. Their focus should be creating more shooting opportunities.

- As a secondary metric, West Brom should probably look at the causes for all those fouls. I didn't see the game directly, but further analysis and study would indicate what happened there, and how the situation could be improved.

There is a tendency to analyse our losing tests to find out why they lost (if only so we can explain it to our managers), and with thorough planning and a solid hypothesis we should be able to identify why a test did not do well. It's also human nature to briefly review our winners so that we can see if we can do even better in future. But draws? They get ignored and forgotten - the test recipe had no impact and is not worth pursuing. Additionally, it didn't lose, so we don't apply the same level of scrutiny that we would if it had suffered a disastrous defeat. If wins are green and losers are red, then somehow the draws just fade to grey. However, it shouldn't be the case.

So what should we look for in our test data? Firstly - revisit the hypothesis. You expected to see an overall improvement in a particular metric, but that didn't happen: was this because something happened in the pages between the test page and the success page? For example, did you reduce a page's exit rate by apparently improving the relevance of the page's banners, only to lose all the clickers on the very next page instead - the net result is that order conversion is flat, but the story needs to be told more thoroughly. Think about how Manchester United and Cambridge United need different strategies to improve their performance in the next match.

But what if absolutely all the metrics are flat? There's no real change in exit rate, bounce rate, click through rate, time on page... any other page metric or sales figure you care to mention? It is quite likely, that the test you've run was not significant enough. The change in wording, colour, design or banner that you made just wasn't dramatic enough to affect your visitors' perceptions and intentions. There may still be something useful to learn from this: your visitors aren't bothered if your banners feature pictures of your product or a family photo; or a picture of a single person or a group of people... or whichever it may be.

FA Cup matches have the advantage of a replay before there's extra time and penalties (the first may be an option for a flat test, the second sounds interesting!), so we're guaranteed a re-test, more data and a definite result in the end - something we can all look for in our tests.

The articles in the Pitfalls of Online Optimisation and Testing series

Article 1: Are your results really flat?
Article 2: So your results really are flat - why?
Article 3: Discontinuous Testing

Monday, 24 November 2014

Real-Life Testing and Measuring KPIs - Manchester United

I enjoy analytics and testing, and applying them to online customer experience - using data to inform ways of improving a website. Occasionally, it occurs to me that life would be great if we could do 'real life' testing - which is the quickest way home; which is the best meal to order; which coat should I wear today (is it going to rain)? Instead, we have to be content with before/after analysis - make a decision, make a change, and see the difference.

One area which I also like to look at periodically is sport - in particular, football (soccer). I've used football as an example in the past, to show the importance of picking the right KPIs. In football, there's no A/B testing - which players should a manager select, which formation should they play in - it's all about making a decision and seeing what happens.

One of my least favourite football teams is Manchester United. As a child, my friends all supported Liverpool, and so I did too, having no strong feeling on the subject at the time. I soon learned, however, that as Liverpool fans, it was traditional to dislike Manchester United, due to their long-standing (and ongoing) rivalry. So I have to confess to slight feeling of superiority whenever Manchester United perform badly. Since the departure of their long-serving manager, Alex Ferguson, they've seen a considerable drop in performance, and much criticism has been made of his two successors, first David Moyes, and now Louis van Gaal. David Moyes had a poor season (by Man Utd's standards) and was fired before the end of the season. His replacement, Louis van Gaal, has not had a much better season this far. Here's a comparison of their performance, measured in cumulative points won after each game [3 points for a win, 1 for a draw, 0 for a loss].

So, how bad is it?

Well, we can see that performance in the current season (thick green line) is lower than last season (the blue line). Indeed, after game 10 in early November 2014, the UK media identified that this was the worst start to the season since 1986. But since then, there's been an upturn in performance and at the time of writing, Manchester United have won their last two matches. So perhaps things aren't too bad for Louis van Gaal. However, the situation looks slightly different if we overlay the line for the previous season, 2012-2013, which was Sir Alex Ferguson's final season in charge.

You can see the red line indicating the stronger performance that Manchester United achieved with Sir Alex Ferguson, and how the comparison between the two newer managers pales into insignificance when you look at how they've performed against him. There's a message here about comparing two test recipes when they've both performed badly against the control recipe, but we'll move on.

There have been some interesting results for Manchester United already this season, in particular, a defeat by Leicester City (a much smaller team who had just been promoted into the Premier League, and were generally regarded as underdogs in this match). The 5-3 defeat by Leicester rewrote the history books. Among other things...

- It was the first time Leicester had scored five or more goals in 14 years- It was the first time Man Utd have ever conceded four or more goals in a Premier League game against a newly-promoted team
- It was the first time Leicester City have scored four or more goals against Manchester Utd in the league since April 1963

But apart from the anecdotal evidence, what statistical evidence is there that we could point to that would highlight the reason for the recent decline in performance? Where should the new manager focus his efforts for improvement -based on the data (I haven't watched any of the matches in question).

Let's compare three useful metrics that show Manchester United's performance over the first 10 games of the season: goals scored, goals conceded and clean sheets (i.e. matches where they conceded no goals). Same colour-scheme as before:

This graph highlights (in a way I was not expecting) the clear way that Sir Alex Ferguson's successors need to improve: their teams need to score more goals. I know that seems obvious, but we've identified that the team's defence is adequate, conceding fewer or the same number as in Alex Ferguson's season. However, this data is a little-oversimplified, since it also hides the 5-3 defeat I gave as an example above, where the press analysis after the match showed 'defensive frailties' in the Manchester United team. Clearly more digging would be required to identify the true root cause - but I'd still start with 'How can we score more goals'.

Disclaimers:
- The first ten games for each season are not against the same teams, so the 2012-13 season may have been 'easier' than the subsequent seasons (in fact, David Moyes made this complaint before the 2013-14 season had even started).
- Ten games is not a representative sample of a 38-game season, but we're not looking at the season, we're just comparing how they start. We aren't giving ourselves the benefit of hindsight.
- I am a Liverpool fan, and at the time of writing, the Liverpool manager has had a run of four straight defeats. Perhaps I should have analysed his performance instead. No football manager is perfect (and I hear that Arsenal are also having a bad season).

So: should Manchester United sack Louis van Gaal? Well, they didn't sack David Moyes until there were only about six matches left until the end of the season; it seems harsh to fire Louis van Gaal just yet (it seems that the main reason for sacking David Moyes was actually the Manchester United share price, which also recovered after he'd been fired). I shall keep on reviewing Manchester United's performance and see how the team performs, and how the share price tracks it.

I whole-heartedly endorse making data-supported decisions, but only if you have the full context. Here, it's hard to call (I haven't got enough data), especially since you're only looking at a before/after analysis compared to an A/B test (which would be a luxury here, and probably involve time travel). And that, I guess, is the fun (?) of sport.

Thursday, 28 August 2014

Telling a Story with Web Analytics Data

Management demands actionable insights - not just numbers, but KPIs, words, sentences and recommendations. It's therefore essential that we, as web analysts and optimisers, are able to transform data into words - and better still, stories. Consider a report with too much data and too little information - it reads like a science report, not a business readout:

Consider a report concerning four main characters;
Character A: female, aged 7 years old. Approximately 1.3 metres tall.
Character B: male, aged 5 years old.
Character C: female, aged 4 years old.
Character D: male, aged 1 year old.

The main items in the report are a small cottage, a 1.2 kw electric cooker, 4 pints of water, 200 grams of dried cereal and a number of assorted iron and copper vessels, weighing 50-60 grams each.

After carrying out a combination of most of the water and dried cereal, and in conjunction with the largest of the copper vessels, Character B prepared a mixture which reached around 70 degrees Celsius. He dispensed this unevenly into three of the smaller vessels in order to enable thermal equilibrium to be formed between the mixture and its surroundings. Characters B, C and D then walked 1.25 miles in 30 minutes, averaging just over 4 km/h. In the interim, Character A took some empirical measurements of the chemical mixture, finding Vessel 1 to still be at a temperature close to 60 degrees Celsius, Vessel 2 to be at 70 degrees Fahrenheit and Vessel 3 to be at 315 Kelvin, which she declares to be optimal.

The report continues with Character A consuming all of the mixture in Vessel 3, then single-handedly testing (in some case destruction testing) much of the furniture in the small cottage.

The problem is: there's too much data and not enough information.

The information is presented in various formats - lists, sentences and narrative.

Some of it the data is completely irrelevant (the height of Character A, for example)
Some of it is misleading (the ages of the other characters lacks context);
Some of it is presented in a mish-mash of units (temperatures are stated four times, with three different units).
The calculation of the speed of the walking characters is not clear - the distance is given in miles; the time is given in minutes; and the speed in kilometres per hour (if you are familiar with the abbreviation km/h).

Of course, this is an exaggeration, and as web analytics professionals, we wouldn't do this kind of thing in our reporting.

Visitors are called visitors, and we consistently refer to them as visitors (and we ensure that this definition is understood among our readers)
Conversion rates are based on visitors, even though this may require extra calculation since our tools provide figures based on visits (or sessions)
Percentage of traffic coming from search is quoted as visitors (not called users), and not visits (whether you use visitors or visits is up to you, but be consistent)
Would you include number of users who use search? And the conversion rate for users of search?
And when you say 'Conversion', are you consistently talking about 'user added an item to cart', or 'user completed a purchase and saw the thank-you page'?
Are you talking about the most important metrics?

So - make sure, for starters, that your units and data and KPIs are consistent, contextual, or at least make sense. And then: add the words to the numbers. It's only the start to say that: "We attracted 500 visitors with paid search, at a total cost of £1,200." Go on to talk about the cost per visitor, break it down into key details by talking about the most expensive keywords, and the ones that drove the most traffic. But then tell the story: there's a sequence of events between user seeing your search term, clicking on your ad, visiting your site, and [hopefully] converting. Break it down into chronological steps and tell the story!

There are various ways to ensure that you're telling the story; my favourites are to answer these types of questions:
"You say that metric X has increased by 5%. Is that a lot? Is that good?"
"WHY has this metric gone up?"
"What happened to our key site performance indicators (profit, revenue, conversion) as a result?"
and my favourite:
"What should we do about it?"

There are, of course, various ways to hide the story, or disguise results that are not good (i.e. do not meet sales or revenue targets) - I did this in my anecdote at the start. However, management tend to start looking at incomplete data, or data that's obscure or irrelevant, and go on to ask about the data that's "missing"... so the truth will out, so it's better to show the data, tell the whole story, and highlight why things are below par.

It's our role to highlight when performance is down - we should be presenting the issues (nobody else has the tools to do so) and then going on to explain what needs to be done - this is where actionable insights become invaluable. In the end, we present the results and the recommendations and then let the management make the decision - I blogged about this some time ago - web analytics: who holds the steering wheel?

In the case of Characters A, B, C and D, I suggest that Characters B and C buy a microwave oven, and improve their security to prevent Character A from breaking into their house and stealing their breakfast. In the case of your site, you'll need to use the data to tell the story.

Wednesday, 9 July 2014

Why Test Recipe KPIs are Vital

Imagine a straightforward A/B test, between a "red" recipe and a "yellow" recipe. There are different nuances and aspects to the test recipes, but for the sake of simplicity the design team and the testing team just codenamed them "red" and "yellow". The two test recipes were run against each other, and the results came back. The data was partially analysed, and a long list of metrics was produced. Which one is the most important? Was it bounce rate? Exit rate? Time on page? Does it really matter?

Let's take a look at the data, comparing the "yellow" recipe (on the left) and the "red" recipe (on the right).

As I said, there's a large number of metrics. And if you consider most of them, it looks like it's a fairly close-run affair.

The yellow team on the left had

28% more shots

8.3% more shots on target

22% fewer fouls (a good result)

Similar possession (4% more, probably with moderate statistical confidence)

40% more corners

less than half the number of saves (it's debatable whether more or fewer saves is better, especially if you look at the alternative to a save)

More offsides and more yellow cards (1 vs 0).

So, by most of these metrics, the yellow team (or the yellow recipe) had a good result. They might even have done better.

However, the main KPI for this test is not how many shots, or shots on target. The main KPI is goals scored, and if you look at this one metric, you'll see a different picture. The 'red' team (or recipe) achieved seven goals, compared to just one for the yellow team.

In A/B testing, it's absolutely vital to understand in advance what the KPI is. Key Performance Indicators are exactly that: key. Critical. Imperative. There should be no more than two or three KPIs and they should match closely to the test plan which in turn, should come from the original hypothesis. If your test recipe is designed to reduce bounce rate, there is little point in measuring successful leads generated. If you're aiming for improved conversion, why should you look at time on page? These other metrics are not-key performance indicators for your test.

Sadly, Brazil's data on the night was not sufficient for them to win - even though many of their metrics from the game were good, they weren't the key metrics. Maybe a different recipe is needed.

Friday, 5 August 2011

Web Analytics - A Medical Emergency

One of my favourite TV programmes at the moment is Casualty. Or perhaps it's Holby City (they're both the same, really). A typical episode unfolds with all the drama and angst between the main characters, which is suddenly broken up by the paramedics unloading a patient from an ambulance. Perhaps the patient is the victim of a fire, or a road traffic accident, or another emergency. Whatever it is, the paramedics come in, wheeling the patient along, giving a brief description of who they've got, the main symptoms, and start rattling off a list of numbers. "Patient is RTA victim, aged 56, BP is 100 over 50, pulse is 58 and weak, 100 mls of adrenaline given..." the list goes on. The senior consultant who is receiving the patient hasn't really got time to be asking questions like, "Is that bad?" and certainly not, "Is this important?" The questions he's already asking himself are, "What can we do to help this patient?" and "What's been done already?"

Regular readers will already know where I'm going with this analogy, so I'll try to keep it brief. In a life-or-death situation (and no, web analysts are hardly ever going to have that degree of responsiblity) there isn't really time to start asking and answering the trivial questions. The executive dashboard, the report or the update need to state what the results are at the moment, and how this looks against target, normal or threshold, and what action needs to be taken. The executive, in a similar way to the Formula 1 driver I mentioned last time, hasn't got time to look through all the data, decide what's important and what isn't, and what needs to be looked at.

As an aside, I should comment that reporting dying figures to an executive is likely to lead to a series of questions back to the analyst, so be ready to answer them. Better still, including a commentary that states the reasons for a change in the figures and the action that's being taken to address them. Otherwise, all you'll achieve is an unfortunate way of generating action from the content team, who probably won't be too pleased to receive a call from a member of the executive team, asking why their figures are dying, and will want to know why you didn't tell them first.

Another skill comes in determining the key figures to report - the vital statistics. The paramedics know that time is of the essence and keep it deliberately brief and to the point. No waffle. Clear. The thresholds for each KPI are already understood - after all, they have the advantage that all medical staff know what typical temperature, pulse, blood pressure and blood sugar levels are. As a web analyst (or a business analyst), you'll need to gain agreement from your stakeholders on what these are. Otherwise you may find yourself reporting the height and weight of a patient who has severe blood loss, where the metrics are meaningless and don't reflect the current situation. If you give a number for a KPI, and the reply is, "Is that a lot?" then you have some work to do - and I have some answers for you too.

Now, all I've covered so far is the reporting - the paramedics' role. If we were (or are) web reporters, then that would be the sum of our role: to look at the site, take the measurements, blurt out all the relevant figures and then go back to our desks. However, as web analysts, we now need to take on the role of the medical consultant, and start looking at the stats - the raw data - and working out why they're too high (or too low), and most importantly, what to do about them. Could you imagine the situation where the consultant identifies the cause of the problem - say an infection in the lungs - and goes over to the patient, saying, "That's fine Mr Smith, we have found the cause of your breathlessness. It's just a bacterial infection in your left lung." There would then be a hesitant pause, until the patient says something like, "Can you treat it?" or "What can you do for me?".

Good web analysts go beyond the reporting, through to identifying the cause of any problems (or, if your patient is in good health, the potential for improvements) and then working out what can be done to improve them. This takes time, and skill, and a good grasp of the web analytics tool you're using. You may have to look at your website too - actually look at the pages and see what's going on. Look at the link text; the calls to action; read the copy, and study the images. Compare this with the data you've obtained from your analytics tools. This may not provide all the answers, so you may have to persevere. Go on to look at traffic sources - the referrers, the keywords, the campaign codes. Track down the source of the problem - or the likely causes - and follow the data to its conclusion, even if it takes you outside your site to a search engine and you start trying various keywords in Google to see how your site ranks, and what your PPC actually looks like.

Checking pages on a site is just the equivalent of a doctor actually looking at his patient. He may study the screens and take a pulse and measure blood pressure or take the patient's temperature, but unless he actually looks at the patient - the patient's general appearance, any wounds, scars, marks, rashes or whatever else - he'll be guessing in the dark. This isn't House (another medical drama that I never really took to), this is real medicine. Similarly, doctors may consider environmental factors - what has the patient eaten, drunk, inhaled, come into contact with? What's going on outside the body that might affect something inside it?

There's plenty of debate about the difference between reporting and analysis - in fact I've commented on this before - but I think the easiest answer I could give now is the difference between the paramedic and the doctor. What do you think?

Web Optimisation, Maths and Puzzles

Header tag