Header tag

Friday, 23 September 2016

Premier League Excitement - Further Analysis

In my last post I looked at 'How exciting is the Premier League' and produced the interesting data point that less than 10% of Premier League games are goal-less.  This may be interesting, and it might even count as insight, but it's not very actionable.  We can't do anything with it, or make any decisions from it.  I suppose the question is, "Is that a lot?" and I'll be looking at that question in more detail in future.

So, my next step is to look at how the different teams in the Premier League compare on some of the key metrics that I discussed - goals per game (total conceded plus scored), percentage of goalless games and so on.

Number of goals per game (conceded plus scored)

Firstly, I segmented the data per team:  how many goals were there per game for each team in the Premier League.  This is time-consuming, but worthwhile, and a sample of the data is shown below.  I have data as far back as the 2004-5 season, but the width wouldn't fit on this page: 
Club
Y2010
Y2011
Y2012
Y2013
Y2014
Y2015
Y2016
Arsenal
        2.58
        3.03
        3.24
        2.87
        2.87
        2.82
        2.66
Aston Villa
        2.21
        2.82
        2.37
        3.05
        2.63
        2.32
        2.71
Birmingham

        2.50





Blackburn
        2.79
        2.76
        3.32




Bolton
        2.61
        2.84
        3.24




Charlton
        2.47






Chelsea
        2.32
        2.68
        2.92
        3.00
        2.58
        2.76
        2.95
Crystal Palace




        2.13
        2.58
        2.37
Everton
        2.32
        2.53
        2.37
        2.50
        2.63
        2.58
        3.00
Fulham
        2.58
        2.42
        2.61
        2.89
        3.29


Liverpool
        2.21
        2.71
        2.29
        3.00
        3.97
        2.63
        2.97
Man City
        1.92
        2.45
        3.21
        2.63
        3.66
        3.18
        2.95
Man United
        2.89
        3.03
        3.21
        3.39
        2.82
        2.61
        2.21
Middlesbrough
        2.45






Newcastle
        2.24
        2.97
        2.82
        2.97
        2.68
        2.71
        2.87
Norwich


        3.11
        2.61
        2.37

        2.79
Portsmouth
        2.29






Southampton



        2.87
        2.63
        2.29
        2.63
Tottenham
        2.92
        2.66
        2.82
        2.95
        2.79
        2.92
        2.74
West Brom

        3.34
        2.55
        2.89
        2.68
        2.34
        2.16
Wigan
        2.53
        2.66
        2.74
        3.16



Season Average
2.77
2.80
2.81
2.80
2.77
2.57
2.70

Blank columns indicate a season where a team was not in the Premier League.  
Bold figures show where a team achieved over 3 goals per game for the season.
Y2008 indicates the season 2007-2008.
Firstly:  sorting alphabetically makes sense from a listing perspective, but for comparison the data is best sorted numerically (from highest to lowest). 

Secondly:  There's a lot of data here, and clearly a visualisation is needed:  I'm going with a line graph.  And to avoid spaghetti, I'm going to highlight some of the key teams - the team with the highest average number of goals per game; the team with the lowest, and the average.

Thirdly:  to identify the overall highest- and lowest-goal teams, I'm just going to take the totals of the averages for the last nine seasons, and sort them from the list.  Teams that were not in the Premier League for one or more seasons are included based on their performance while they were in the Premier League.

Premier League Teams:  Average number of goals per game over the last 12 seasons:

Club
Average
Arsenal
      2.842
Tottenham
      2.833
Man City
      2.825
Blackburn
      2.816
Man United
      2.807
Liverpool
      2.781
Newcastle
      2.751
Norwich
      2.717
Bolton
      2.705
Overall Average
      2.702
Birmingham
      2.671
Chelsea
      2.670
West Brom
      2.669
Aston Villa
      2.667
Fulham
      2.613
Southampton
      2.605
Wigan
      2.566
Everton
      2.518
Charlton
      2.474
Middlesbro
      2.404
Portsmth
      2.368
Crystal Palace
      2.360

Key takeaways:  
- Arsenal have had the most total goals per game over the last nine seasons (2.842 goals per game)
- Everton have the lowest average number of goals per game for teams which have been present in all 12 seasons (2.518 goals per game).
- Put another way:  Arsenal fans have seen 1296 league goals in the last 12 seasons, compared to 1148 for Everton fans (148 fewer).


Theo Walcott, celebrating during Arsenal's win over Hull, Sept 2016  Image credit

Time for some graphs!

Firstly, average goals per season, for the last 12 seasons, for Arsenal, Everton, the league average, Liverpool (who achieved an average of 3.97 in 2013-14) and Man United (because they're always worth comparing).



This shows clearly that Arsenal (green line) have consistently exceed the league average, falling below it only twice in the last 12 seasons.  Everton (blue) have only once exceeded the average, and that was in the most recent season.  Liverpool have exceeded the average over the last four seasons, but prior to that were consistently below (and similar to Everton).

Connecting this to 'real life' events:

- Everton moving from David Moyes to Roberton Martinez in August 2013 did not make any difference to their 'excitement' factor until the 2015-16 season.

- Arsenal, and Arsene Wenger, could not be called 'boring' based on their goals per game. 

- Brendan Rogers had an interesting time at Liverpool, when they hit the highest goals-per-game for the season for any club in the last 12 years (3.97).  Note that this does not discriminate between goals scored or conceded.

Secondly, adjusting the data to show the difference between each team and the overall average (so that the data shows a delta versus the average).



To give you an indication of Liverpool's remarkable 2013-4 season:  their games had more than one goal per game more than the season average.  Brendan Rogers had an eventful time at Liverpool.

Fulham also had an 'exciting' season in 2013-4, achieving 3.29 goals per game (average was 2.77) - but were subsequently relegated.

In summary:

- Arsenal have had the highest average goals per game over the last nine seasons (2.842 goals per game), while Everton have the lowest, at 2.518 goals per game.
- Arsenal have exceeded the league average goals per game in 10 out of the last 12 seasons, and have the highest average overall.
- Man United have achieved above-average goals per game in nine of the last 12 seasons; however the 2015-16 season was the least 'exciting' they've recorded in that period.

Review

Segmenting the data by team is proving more useful.  It's now possible to make predictions about the 2016-17 season:

- Arsenal to remain most 'exciting', closely followed by Tottenham and Man City.
- Everton to remain the least 'exciting', with 1-1, 2-1 and 2-0 results dominating.
- Man United are extremely unpredictable, especially as they have a new manager this season (although nobody could have predicted the dreadful start they've made to the current season).

The raw data used in this analysis is available from the football data website, among others.

More articles on data analysis in football:

Reviewing Manchester United's Performance
Should Chelsea Sack Jose Mourinho? (it was relevant at the time I wrote it)
How exciting is the English Premier League?  (quantifying a qualitative metric)
The Rollarama World Football Dice Game (a study in probability)

Monday, 15 August 2016

Data, Analysis, Insight and Wisdom

Good web analysts love producing 'actionable insights' - it's the way we add value to the business we're in; it makes our managers happy - it's like finding hidden treasure.  But what are actionable insights (five years ago I asked who makes them actionable - the analyst or the manager) and how can get better at finding them and sharing them?

Web analytics starts with data - this could be various  kinds of data depending on the business model you're following.  So, in order to keep things industry-neutral, I'm going to focus on an unrelated area, and see what we can learn from it.  Yes, I'm going back to my old favourite:  reporting and analysing the weather.


In meteorology, scientists gather all kinds of data from the atmosphere.  They are interested in collecting multiple types of data - or "data points" - from multiple sources in various ways.  And the good news is that this data is quantitative (it can be given a number).

A thermometer will tell you the air and ground temperature - how comfortable things are at the moment
A barometer - the air pressure where you are at the moment
A hygrometer will measure humidity (or possibly rainfall) - depending on conditions
An anemometer - is used to measure wind speed and direction , and will tell you which way things are going to change and where to look for what's coming next.

Each instrument will tell you different things about how things are at the moment.  The anemometer can tell you which way things are going to change, and to some extent, which way to look to see what's going to happen next.  The data that these devices will give you will almost always be numerical (or partly numerical), and certainly abstract.  Each one individually will give you a partial picture of the current situation.  None of them by themselves will actually tell you anything meaningful:  is it raining?  Yes, but has it started, has it stopped, is it getting heavier or lighter?  The temperature may be 16o C but what time of day is it; what time of year is it and is the temperature going up or down?

What is needed here is some analysis.

Analysis is the art of combining the data sources to tell you something more meaningful, with a wider view, and painting a better picture.

One of the easiest forms of analysis is comparison.  It's hot today, but is it hotter than yesterday, this time last week, or this time last year?  Meteorologists typically compare year on year - there's little benefit in comparing sunny May with rainy April (in the UK, in theory).  But comparing May 2016 with May 2015 will tell you if we're having a good spring season.

And comparison leads naturally to trending.  It might be raining more today than it was yesterday, but how does that pattern compare over a longer time period?  And if you want to present this data to a wider audience, you'll either compile a table of data, or produce a graph of your data.  And the analysis is already starting - comparing two forms of data (typically time and another measurement) and producing comparable data (and possibly even trends).

Another form of analysis is statistical analysis - comparing averages, ranges, populations and so on.  Providing you and your audience are agreed on which average you're taking, what it means and what its potential drawbacks are, this can be a very useful form of analysis.


A note:  analysis is not just plotting graphs.  No, really, it isn't.  A spreadsheet can plot graphs, but analysis requires brainpower.  Therefore, plotting graphs (by itself) isn't analysis.  It can help to direct your analysis, and tell you what the data is saying, but a graph is just a set of lines on a page.  Plotting good, meaningful graphs is an exercise by itself, and data visualization is a whole subject of its own.  And a sidenote to this note: there are times when a simple, basic bar chart will be more informative and drive more action than any trendy visualisation with arrows, flowcharts and nodes.  Good doesn't mean visually impressive.

Good analysis breaks down the data into meaningful and relevant sections that will start to tell you more than just the individual data points.  Analysis will combine data points:  for example, imagine combining temperature data combined with geographic data, compared to average data:


This data has been presented in a very accessible way, and you can see at a glance that the southern half of the UK had a slightly-wetter-than-average January, whereas the northern half of the UK, and especially Scotland, was much drier than average.  

This is analysis clearly presented.  However, it isn't insight:  I haven't explained why the rainfall varied so much.  And if you're looking to explain why the rainfall in January was less than in June (for example), then you can easily point to annual trends:  the rain in January is always less than in June.

Insight

Insight is the next step from analysis, and insight will often show you WHY something is happening.  Yes, I know you won't fully answer "why" visitors behave in the way they do just by consulting quantitative data, but it's a start - and additionally, you'll be able to answer why a number went up, down or sideways.  You'll know you're beginning to show insight when you've stopped drawing graphs and tables, and started writing in sentences.  And not just describing what the data is saying, either, but explaining what's actually happening and the underlying causes.  "Total sales this week fell from 100k to 74k" isn't analysis.  "Total sales fell from 100k to 74k due to the conclusion of the summer sale and a drop in men's shoe sales; last year we continued the sale for an extra month and consistently achieved 100k+ sales for an additional three weeks with no loss of profit."

Or, to keep within the weather theme: "Rainfall in the south and east was above average throughout June due to a series of Atlantic storms which passed over continental Europe; in previous years these storms have tracked much further south."

Insight is about using the data to tell you about something that's happened before, and what happened next.  For example, we don't watch the weather channel to see what the weather was like yesterday or earlier this morning.  We may watch the weather channel to see what the weather's like now in another part of the country (or the world), but more often we want to know what the weather's going to be like tomorrow.  Good analysis will enable you to generate insights, extrapolate data and forecast future performance.


The regions of the UK Shipping Forecast, for which the BBC produces regular weather forecasts.

Should I buy (or pack) an umbrella or sunblock?  Which way do I point my windmill?  How do I trim my sails?  Do I go fishing tonight or wait until dawn?  When do I gather my crops?  How you use the data and then generate the insight depends on the audience.  This is life or death for some people.


Online, there's a clearer connection between actions and consequences - if you increase your online advertising spend, you should see more traffic coming to your site (and if you don't, start analysing and find out why, and what you should do about it).  With the weather - you can't make it rain, but you can work out why it rained (or didn't), when it's going to rain again (because you know it will), and what steps to take in order to make the best of the weather.  If you work in a team or a situation where the brand, marketing and advertising decisions for the online channel are made by an offline team with TV, radio and press expertise, you may find yourself in this kind of situation:  do not despair!

Wisdom

Some insights can be demonstrated repeatedly, and described succinctly so that they eventually become gathered wisdom:

"The north wind doth blow, and we shall have snow."
"Red sky at night, shepherd's delight; red sky at morning, shepherd's warning"

In online marketing, it could be something like, "Always show the discounted price in red", (honestly, I wrote that before I discovered that somebody genuinely thought it was a good idea) or "Never show a banner with two different products" (I'm making this stuff up).

No amount of data will automatically produce wisdom.  Big data (however big that might mean) will not spontaneously transform into insight and wisdom when it reaches a critical mass, in the same way that no amount of charcoal will produce diamonds even though they're made of the same stuff.

Actionable

Data, analysis and insight are useful tools and worthwhile aims - providing that they are actionable.  In my examples, I've been talking about using weather data to inform decisions, such as whether to wear a sunhat or a raincoat.  In this case, the data on temperature and rainfall are critical.  In online analysis (or in any kind of data analysis) it's vital that the analysis and insight are focused on the key performance indicators - that's what will make it actionable.  Talking about traffic to the landing page or the product information page will be trivial unless you can connect that data point to a key data point which drives the business - such as conversion, margin or revenueWhen you gather the data which enables you to tie your analysis and insight to a KPI, then your insight is far more likely to be actionable (I say this as your recommendation may be profitable but not feasible).

"My analysis shows that if we direct traffic from the landing page to page B instead of page A, then we will see an increase in conversion because 65% of people who see page B add an item to cart, compared to 43% for page A."  You can almost hear the sound of the checkout ringing.


"If we change our call to action from 'Buy Now' to 'Find out more', then the click through rate will go up."  Yes... and then what?  The click-through rate is probably a good data point to start with, but how much will it go up by, and what will the impact be on the website's KPIs?

Conclusion 

  If data analysis (sales revenue, time, banner description and click-through rate) indicates that sales revenue drops when you mix your products in your banners, because people ignore the higher priced product and only buy the cheaper one, then this can move from data to analysis to insight to wisdom.  It may take repeated observations to get there (was it a one-off, does it apply to all products, does it only happen in summer?), but it shows how you can move from data to analysis to actionable insights.

Other articles I've written on Website Analytics that you may find relevant:

Web Analytics - Gathering Requirements from Stakeholders

Wednesday, 13 July 2016

My Favourite Chess Game

I've covered a range of my Chess games in the past - some wins, some losses - but in this post I'd like to review my favourite Chess game, the strongest opponent where I scored a win.  This was within the Kidsgrove Chess Club's own internal league, and all six players played each other twice (once as White, once as Black).

This was my game against Jules H, the strongest player I've scored a win against.  I was White, and played my standard Queen's Gambit.

1. d4 d5
2. c4 e6
3. cxd5 exd5




Now I'm sure 3. cxd5 is regarded as a poor move, reducing the tension in the centre, but I thought it made sense to trade my c-pawn for my opponent's central e-pawn.

4. Nc3 Bb4
5. Bd2 Bxc3
6. Bxc3 Nf6
7. e3 0-0
8. Bd3 Re8
9. Ne2 Qd6
10. Ng3 Bg4

I can't play 11. f3 as I would immediately lose to Rxe3+ and have ongoing trouble on the now-open e-file.  However, I identified that Bg4 left b7 unprotected, so I decided to play Qb3 and look to castle as soon as possible.


11. Qb3 Qb6
12. Qxb6 axb6

No, I wasn't initially intending to trade queens, but since I could leave a scar on my opponent's pawn structure, with b7 as a fixed weakness, I decided to go for it... and then finally castled!

13. 0-0 g6
14. Bd2
Supporting the e-pawn, so that I can finally push play f3, which will kick the bishop and in future enable me to play e4.




















14. ... c5  My opponent looks to straighten out his kingside.
15. a3 c4
16. Bc2 Bd7
17. f3 Nc6
18. Rae1
Completing my development. Putting the rooks behind the e- and f-pawns should enable me, with the support of the minor pieces, me to push them forwards and make significant gains in space.




18.  ... b5
My opponent is grabbing space on the queenside, but I'm slowly and steadily preparing to advance my central pawns.

19.Bc3 Re6
20.e4 h5
21.Ne2? dxe4?
22.fxe4

After getting lucky with Ne2, I've now developed a t
riple threat:  

I can advance the d-pawn, forking knight and rook, and if the rook retreats I can then capture the knight on f6.  If the rook moves to d6, I can then also advance the e-pawn, forking the rook and the other knight (the pawn on e5 would be supported by the bishop on c3.  The bishop pair on c2 and c3 are beginning to see their diagonals open up, and they're pointing towards black's king.





22. ... Re7?
A blunder, since I can immediately play Rxf6.  I guess the complications of the position got to my opponent, and he figured he could save his pieces by retreating the rook.  The software I've consulted suggests ...Nxe4 as a better continuation for Black.

23.Rxf6 Nd8
24.Nf4 Ra6
25.d5 b6?

I'm not sure what my opponent was doing with these moves.  After threatening my rook on f6, he's now locked his rook out of the game, and is continuing to play on the wings, while I move through the centre.

26.e5 Nb7

Earlier that week, I'd been reading about 'clearing the barriers' towards your opponent's king, and I considered this very carefully.  I'm a knight ahead (after the capture on move 23),  so I have extra material to play with.  Also, I have both of my bishops pointing towards white's king, so I anticipated that after Nxg6 I would force white's king into the corner, and potentially get a discovered check to capture more material.  After 28. Rxg6+ Kh8 I can play e6 winning a bishop, or after 
28. Rxg6+ Kh7 I can play Rxb7+ winning a rook.

So I went for it, trading my knight for two pawns and a direct attack.
27.Nxg6 fxg6
28.Rxg6+ 


After 28 Rxg6+ there are possibilities for me to win material through discovered checks

28.  ... Kf8

Avoiding both of the discovered checks, but enabling me to bring my other rook into play with a tempo.

29.Rf1+

I was expecting 29...Rf7 30.Rxf7+ Ke8 31.e6 Bxe6 32.dxe6 which lengthens the game but enables me to win more material.  Instead...

29. ...  
Ke8
30.Rg8# 1-0











A surprisingly quick finish to a very pleasing game.  I appreciate that my opponent made a few suspect moves, but I'm pleased with the way I handled the game, the tactics and strategies I used (placing my rooks and bishops on squares that would maximise their range and usefulness) and as I said, this is probably one of my favourite games.

Here are a few of my other Chess games:

My earliest online Chess game
My very earliest Chess game (it was even earlier than I thought)
 
The strangest game of Chess I ever played - 1. d4 d5 2. c4 b5
I was not sure what I was supposed to do with that; apparently I was supposed to play 3. c4xb5, but played 3. c4xd5 and immediately and unintentionally took my opponent out of his prep.

The Chess game I'm least proud of
I got greedy, tried to hold onto a pawn that I should have given back, and expended a lot of time and effort on it, instead of protecting my King (on the other side of the board)

Thursday, 30 June 2016

Revisiting Fibonacci Constants

In a previous post (four years ago), I explained the Fibonacci Series - where it comes from, how it originates, and how it can appear in nature.  I also talked about the golden ratio, which is the ratio between subsequent terms in the Fibonacci Series.

I've been doing some further reading and research on the Fibonacci Series, and have been doing some of my own calculations and investigations.

Firstly:  what happens if we extend the series, so that instead of just summing the two previous terms, we sum the three previous terms, or the four or five previous?

This has been done before (I wasn't too surprised), and these are known as the following:
Fibonacci - 2 terms
Tribonacci - 3 terms
Tetranacci (or quadranacci) - 4 terms
Quintanacci (or pentanacci) - 5 terms
Hexanacci - 6 terms


I struggled to find names for the higher-number terms, so I'm going to submit my own.

Heptanacci - 7 terms
Octanacci - 8 terms
Nonancci - 9 terms
Decanacci - 10 terms

I stopped at 10, as I found that the data I'd accumulated was enough to draw some interesting conclusions from.  Here are the first few terms of each of the series:

Fib:  0,1,1,2,3,5,8,13,21,34,55,89,144,233,377
Trib:  0,0,1,1,2,4,7,13,24,44,81,149,274,504,927
Tetra: 0,0,0,1,1,2,4,8,15,29,56,108,208,401,773
Quint: 0,0,0,0,1,1,2,4,8,16,31,61,120,236,464
Hex: ...0,0,1,1,2,4,8,16,32,63,125,248
Hept:  ...0,0,1,1,2,4,8,16,32,64,127
Oct: ...0,1,1,2,4,8,16,32,64,128,255,509,1016,2028,4048,8080,16128
Non:  ...0,1,1,2,4,8,16,32,64,128,256,511,1021,2040,4076,8144
Dec: ..,0,1,1,2,4,8,16,32,64,128,256,512,1023,2045,4088

Taking this raw data, I then started to look at the ratios between subsequent terms.  We've seen previously that the Fibonacci series has the golden ratio 1.61803... or (1+ sqrt(5))/2 but what about the other series?

Fib 1.61803
Trib 1.83929
Tetra 1.92756
Quint 1.965948
Hex 1.983583
Hept 1.991964
Oct 1.996031
Non 1.998029
Dec 1.999019

I haven't identified the expressions for each of these but have found from research that the tetranacci constant satisfies x + x-4 = 2.  

Plotting the number of terms being summed (or the n-number for the series) against the ratio gives this graph.
The question is:  will the ratio ever reach 2?  It looks like the line will head towards 2 as an asymptote, but will it reach 2 if N increases?

The answer is no, and my proof is as follows:


Take the N=10 series as an example:
0,1,1,2,4,8,16,32,64,128,256,512,1023,2045,4088

We can see after the initial 0 and 1, that the first few terms are exactly double the previous one.  This is because each term is the sum of all the previous non-zero terms, including both of the 1s.  1+1 = 2, 1+1+2 = 4, 1+1+2+4=8 etc.  In this case, the ratio of each term to its previous term is 2, each term is exactly double the previous one.

However, this doubling of terms eventually ends:  when the sum no longer includes all the previous terms, that is to say, when the sum no longer includes both of the 1s and all subsequent terms, then the ratio falls below 2.  In the N=10 example above, the ratio falls below 2 when we reach 1023.


1+1+2+4+8+16+32+64+128+256 = 512
1+2+4+8+16+32+64+128+256+512 = 1023

At this point, the ratio falls below 2 (1023/512 = 1.998.  This fall will occur for any and all series which follow the "sum of previous terms" pattern; as N increases, it just takes more terms, and the final ratio will get closer to 2, but will remain below it.  As an aside, Wikipedia states that the ratio for an n-nacci series tends to the solution, x, of the equation  (no proof given, although my data confirms it).

Next:  I will look at what happens when we sum the previous term and
half of the term before that...e.g.  N = 1.5, N= 2.5, N=3.5 etc.