Header tag

Tuesday, 19 June 2018

When Should You Switch A Test Off? (Tunisia 1 - England 2)

Another day yields another interesting and data-rich football game from the World Cup.  In this post, I'd like to look at answering the question, "When should I switch a test off?" and use the Tunisia vs England match as the basis for the discussion.


Now, I'll admit I didn't see the whole match (but I caught a lot of it on the radio and by following online updates), but even without watching it, it's possible to get a picture of the game from looking at the data, which is very intriguing.  Let's kick off with the usual stats:



The result after 90 minutes was 1-1, but it's clear from the data that this would be a very one-sided draw, with England having most of the possession, shots and corners.  It also appears that England squandered their chances - the Tunisian goalkeeper made no saves, but England could only get 44% of their 18 shots on target (which kind of begs the question - what about the others - and the answer is that they were blocked by defenders).  There were three minutes of stoppage time, and that's when England got their second goal.

[This example also shows the unsuitability of the horizontal bar graph as a way of representing sports data - you can't compare shot accuracy (44% vs 20% doesn't add up to 100%) and when one team has zero (bookings or saves) the bar disappears completely.  I'll fix that next time.]

So, if the game had been stopped at 90 minutes as a 1-1 draw, it's fair to say that the data indicates that England were the better team on the night and unlucky to win.  They had more possession and did more with it. 

Comparison to A/B testing

If this were a test result and your overall KPI was flat (i.e. no winner, as in the football game), then you could look at a range of supporting metrics and determine if one of the test recipes was actually better, or if it was flat.  If you were able to do this while the test was still running, you could also take a decision on whether or not to continue with the test.

For example, if you're testing a landing page, and you determine that overall order conversion and revenue metrics are flat - no improvement for the test recipe - then you could start to look at other metrics to determine if the test recipe really has identical performance to the control recipe.  These could include bounce rate; exit rate; click-through rate; add-to-cart performance and so on.  These kind of metrics give us an indication of what would happen if we kept the test running, by answering the question: "Given time, are there any data points that would eventually trickle through to actual improvements in financial metrics?"

Let's look again at the soccer match for some comparable and relevant data points:

*  Tunisia are win-less in their last 12 World Cup matches (D4 L8).  Historic data indicates that they were unlikely to win this match.

*  England had six shots on target in the first half, their most in the opening 45 minutes of a World Cup match since the 1966 semi-final against Portugal.  In this "test", England were trending positively in micro-metrics (shots on target) from the start.

Tunisia scored with their only shot on target in this match, their 35th-minute penalty.  Tunisia were not going to score any more goals in this game.

*  England's Kieran Trippier created six goalscoring opportunities tonight, more than any other player has managed so far in the 2018 World Cup.  "Creating goalscoring opportunities" is typically called "assists" and isn't usually measured in soccer, but it shows a very positive result for England again.

As an interesting comparison - would the Germany versus Mexico game have been different if the referee had allowed extra time?  Recall that Mexico won 1-0 in a very surprising result, and the data shows a much less one-sided game.  Mexico won 1-0 and, while they were dwarfed by Germany, they put up a much better set of stats than Tunisia (compare Mexico with 13 shots vs Tunisia with just one - which was their penalty).  So Mexico's result, while surprising, does show that they did play an attacking game and should have achieved at least a draw, while Tunisia were overwhelmed by England (who, like Germany should have done even better with their number of shots).

It's true that Germany were dominating the game, but weren't able to get a decent proportion of shots on target (just 33%, compared to 40% for England) and weren't able to fully shut out Mexico and score.  Additionally, the Mexico goalkeeper was having a good game and according to the data was almost unbeatable - this wasn't going to change with a few extra minutes.


Upcoming games which could be very data-rich:  Russia vs Egypt; Portugal vs Morocco.



No comments:

Post a Comment