Header tag

Monday, 25 June 2018

Data in Context (England 6 - Panama 1)

There's no denying it, England have made a remarkable and unprecedented start to their World Cup campaign.  6-1 is their best ever score in a World Cup competition, exceeding their previous record of 3-0 against Paraguay and against Poland (both achieved in the Mexico '86 competition).  A look at a few data points emphasises the scale of the win:


*  The highest ever England win (any competition) is 13-0 against Ireland in February 1882.
*  England now share the record for most goals in the first half of a World Cup game (five, joint record with Germany, who won 7-1 against Brazil in 2014).
* The last time England scored four or more goals in a World Cup game was in the final of 1966.
*  Harry Kane joins Ron Flowers (1962) as the only players to score in England's first two games at a World Cup tournament.

However, England are not usually this prolific - they scored as many goals against Panama on Sunday as they had in their previous seven World Cup matches in total.  This makes the Panama game an outlier; an unusual result; you could even call it a freak result... Let's give the data a little more context:

- Panama 
are playing in their first World Cup ever, and that they scored their first ever goal in the World Cup against England.
- Panama's qualification relied on a highly dubious (and non-existent) "ghost goal"

- Panama's world ranking is 55th (just behind Jamaica) down from a peak of 38th in 2013. England's world ranking is 12th.
- Panama's total population is around 4 million people.  England's is over 50 million.  London alone has 8 million.  (Tunisia has around 11 million people).

Sometimes we do get freak results.  You probably aren't going to convince an England fan about this today, but as data analysts, we have to acknowledge that sometimes the data is just anomalous (or even erroneous).  At the very least, it's not representative.

When we don't run our A/B tests for long enough, or we don't get a large enough sample of data, or we take a specific segment which is particularly small, we leave ourselves open to the problem of getting anomalous results.  We have to remember that in A/B testing, there are some visitors who will always complete a purchase (or successfully achieve a site goal) on our website, no matter how bad the experience is.  And some people will never, ever buy from us, no matter how slick and seamless our website is.  And there are some people who will have carried out days or weeks of research on our site, before we launched the test, and shortly after we start our test, they decide to purchase a top-of-the-range product with all the add-ons, bolt-ons, upgrades and so on.  And there we have it - a large, high-value order for one of our test recipes which is entirely unrelated to our test, but which sits in Recipe B's tally and gives us an almost-immediate winner.

The aim of a test is to nudge people from the 'probably won't buy' category into the 'probably will buy' category, and into the 'yes, I will buy' category.  Testing is about finding the borderline cases and working out what's stopping them from buying, and then fixing that blocker.  It's not about scoring the most wins, it about getting accurate data and putting that data into context.


Rest assured that if Panama had put half a dozen goals past England, it would widely and immediately be regarded as a freak result (that's called bias, and that's a whole other problem).

No comments:

Post a Comment