Web Optimisation, Maths and Puzzles: data

Showing posts with label data. Show all posts

Thursday, 25 September 2025

Port Vale 0 Arsenal 2 Match Report

Port Vale vs Arsenal, 24 September 2025

There are only two teams in the football league which aren’t named after the places where they’re located; and yesterday they met for only the second time in 30 years. Port Vale (Stoke on Trent) and Arsenal (London) are currently 61 league places apart, and to be fair it showed: Arsenal just weren’t as good as they should have been on paper. The wage bill for one of Arsenal’s players is more than the entire Port Vale team’s wages; after last night’s performance, I wonder if Arteta is getting his value for money?

It was a disappointing performance from the London side, who seemed to have trouble doing anything with their overwhelming levels of possession. After grabbing an early goal, they struggled to do anything positive or meaningful with the ball, despite the professional encouragement from their fans, and the consistent support of the officials. The Arsenal goalkeeper, Kepa Arrizabalaga, passed a pre-game fitness check, which was more than could be said for the linesman monitoring the Port Vale defensive line. He continued to show signs of an ongoing shoulder injury, which prevented him from lifting his flag anywhere above horizontal for the entire first half, while the Arsenal forwards frequently found themselves receiving the ball with nobody but the goalkeeper to beat. You think I’m kidding? There were zero Arsenal offsides in the whole game.

The linesmen, whose shoulder shows signs of improving

The referee too seemed in awe of the Premiership team’s visit to Vale Park, admiring their ‘strength on the ball’ and ‘technique’ which frequently left the Vale players getting a close up of the turf as they were ‘tackled’ off the ball. Strangely, this was not a symmetrical arrangement; whenever an Arsenal player was dispossessed, this was seen as a sign of rough treatment and was typically identified as an illegal challenge.

The referee reminds the Port Vale strikers to be kind to the Arsenal goalkeeper

He's not offside, ok?

One thing I must confess, though, is that Arsenal didn’t employ cynical and overly defensive strategies after obtaining their opening goal. They kept moving the ball with good technique – they didn’t anything particularly productive with it (they achieved over four times the number of passes that Vale did), and most of their second-half corners were passed all the way back to the halfway line – and wore Vale down with efficiency and energy. They saved their timewasting for more subtle tactics, and one of the most egregious was with the substitutions. I’m not expert at football, but watching one of the Arsenal players (and an England player too) dawdle his way off the pitch when he was substituted was more gamesmanship than sportsmanship. Maybe he wasn’t looking forward to the long drive home? Maybe he just wanted to stay and play a bit more?

It's a long walk to the touchline. So long, that even I could get my camera out, focus it, and take a photo.

He wasn’t the only one in no hurry to leave the field of play, as other substitutions took longer than was probably fair. There were delays taking throw-ins, there were in-team discussions about whose turn it was to take this particular corner, and maybe you’d like to try it this time?

Here's an Arsenal corner. Coming soon.

Port Vale, for their part, showed considerable effort but also looked to be in awe of their visitors. During the second half, the Vale front line got a hold of the ball a couple of times – at one point in a very promising position facing goal on the edge of the penalty area, only to flounder at the last minute. In fact, the data shows that Vale didn’t even achieve a single shot on target. It was going to be one of those nights.

While being a significant disappointment for the Arsenal, who only scraped an early goal and a late one, the game was entertaining for sure. One of the funniest parts of the match came in the second half, when, after a Vale substitution, Arsenal were expected to restart play with a throw-in. There was a breakdown in communication between the referee and the Arsenal player: the referee pointed persistently to where the throw-in should be taken (approximately 5 metres ahead of the halfway line, on the Arsenal left), while the Arsenal player was standing around 10-15 metres further forward of that point. There then followed a confused discussion between the Arsenal player, trying to take the throw in, and the referee, vehemently pointing 10 metres further back. This happened in front of where we were sitting, and with the crowd around us (did I mention this was virtually a sell-out?) we did our best to point out the miscommunication.

Taking a throw-in. It's supposed to be from where the ball went out. Who knew?

Even though miscommunication is frequent in football matches, and it was noted among the fans. Arsenal brought almost 3,000 fans to Vale Park, and they stood, sang and shouted with a high degree of organization and professionalism. There was genuinely no unpleasantness between the two sets of fans, none of the jeering or rude gesturing I have observed at other grounds, and everybody got on with shouting for their team. At least I think that’s what we were doing – in some cases I struggled to turn the chanted syllables into phrases, or even specific words, and on a couple of occasions I managed it, then regretted it. Football fans can certainly employ some colourful metaphors.

Speaking of organization and professionalism: the Arsenal players certainly showed this, at a completely different level to the Vale players. At one point in the first half, Vale gained possession (legally and everything), and in order to hold possession, passed it back from the midfield to the defenders, where it was carefully passed along the line. But not for very long: with alarming efficiency, the Arsenal players deployed a 10-man press, with the defenders moving up to the halfway line and the forward players squeezing possession. Vale almost crumbled in the face of this threat, and did well to keep the ball away from their goal: Arsenal in possession were interesting; Arsenal chasing possession were terrifying.

The size of Arsenal’s squad was clear to see, with players at the match wearing numbers like 41, 49 and 56. This was probably the Arsenal B-team. I hope so, for Arteta's sake. On the other hand, the Port Vale shirts didn't even show a sponsor.

Football is an 11-a-side sport, with shirts numbered 1-56.

The stats tell the story fairly well: Arsenal dominated all the main numbers, and have to be disappointed with the output from their efforts. A lucky early goal, and one at the end made it for them. The game was billed as a David vs Goliath clash—except David forgot his slingshot and Goliath turned up wearing Crocs. Arsenal, sitting proudly at 2nd in the Premier League, took on Port Vale, languishing at 19th in League One, a full 61 league places below. The result? A narrow and nervy 2–0 win for the North London giants. Yes, really. They say that you can only beat the team in front of you, and that’s all that Arsenal managed, when a much more impressive scoreline was expected. Sad times for all.

Possession
Arsenal 81%
Port Vale 19%

Passes
Arsenal 789 (731 completed, 93%)
Port Vale 183 (115 completed, 63%)

Shots
Arsenal 11 (7 inside box , 4 outside)
Port Vale 3 (2 inside box, 1 outside)

Shots on Target
Arsenal 4
Port Vale 0

Corners
Arsenal 6
Port Vale 1

Offsides
Arsenal 0
Port Vale 2

Wednesday, 10 July 2024

How not to Segment Test Data

Segmenting Test Data Intelligently

Sometimes, a simple 'did it win?' will provide your testing stakeholders with the answer they need. Yes, conversion was up by 5% and we sold more products than usual, so the test recipe was clearly the winner. However, I have noticed that this simple summary is rarely enough to draw a test analysis to a close. There are questions about 'did more people click on the new feature?' and 'did we see better performance from people who saw the new banner?'. There are questions about pathing ('why did more people go to the search bar instead of going to checkout?') and there are questions about these users. Then we can also provide all the in-built data segments from the testing tool itself. Whichever tool you use, I am confident it will have new vs return users; users by geographic region; users by traffic source; by landing page; by search term... any way of segmenting your normal website traffic data can be unleashed onto your test data and fill up those slides with pie charts and tables.

After all, segmentation is key, right? All those out-of-the-box segments are there in the tool because they're useful and can provide insight.

Well, I would argue that while they can provide more analysis, I'm not sure about more insights (as I wrote several years ago). And I strongly suspect that the out-of-the-box segments are there because they were easy to define and apply back when website analytics was new. Nowadays, they're there because they've always been there, and because managers who were there at the dawn of the World Wide Web have come to know and love them (even if they're useless. The metrics, not the managers).

Does it really help to know that users who came to your site from Bing performed better in Recipe B versus Recipe A? Well, it might - if the traffic profile during the test run was typical for your site. If it is, then go ahead and target Recipe B for users who came from Bing. And please ask your data why the traffic from Bing so clearly preferred Recipe B (don't just leave it at that).

Visitors from Bing performed better in Recipe B? So what?

Is it useful to know that return users performed better in Recipe C compared to Recipe A?

Not if most of your users make a purchase on their first visit: they browse the comparison sites, the expert review sites and they even look on eBay, and then they come to your site and buy on their first visit. So what if Recipe C was better for return users? Most of your users purchase on their first visit, and what you're seeing is a long-tail effect with a law of diminishing returns. And don't let the argument that 'All new users become return users eventually' sway you. Some new users just don't come back - they give up and don't try again. In a competitive marketplace where speed, efficiency and ease-of-use are now basic requirements instead of luxuries, if your site doesn't work on the first visit, then very few users will come back - they'll find somewhere easier instead.

And, and, and: if return users perform better, then why? Is it because they've had to adjust to your new and unwieldy design? Did they give up on their first visit, but decide to persevere with it and come back for more punishment because the offer was better and worth the extra effort? This is hardly a compelling argument for implementing Recipe C. (Alternatively, if you operate a subscription model, and your whole website is designed and built for regular return visitors, you might be on to something). It depends on the size of the segments. If a tiny fraction of your traffic performed better, then that's not really helpful. If a large section of your traffic - a consistent, steady source of traffic - performed better, then that's worth looking at.

So - how do we segment the data intelligently?

It comes back to those questions that our stakeholders ask us: "How many people clicked?" and "What happened to the people who clicked, and those who didn't?" These are the questions that are rarely answered with out-of-the-box segments. "Show me what happened to the people who clicked and those who didn't" leads to answers like, "We should make this feature more visible because people who clicked it converted at a 5% higher rate." You might get the answer that, "This feature gained a very high click rate, but made no impact [or had a negative effect] on conversion." This isn't a feature: it's a distraction, or worse, a roadblock.

The best result is, "People who clicked on this feature spent 10% more than those who didn't."

And - this is more challenging but also more insightful - what about people who SAW the new feature, but didn't click? We get so hung up on measuring clicks (because clicks are the currency of online commerce) that we forget that people don't read with their mouse button. Just because somebody didn't click on the message doesn't mean they didn't see it: they saw it and thought, "Not interesting," "not relevant" or "Okay, that's good to know but I don't need to learn more". The message that says, "10% off with coupon code SAVETEN - Click here for more" doesn't NEED to be clicked. And ask yourself "Why?" - why are they clicking, why aren't they? Does your message convey sufficient information without further clicking, or is it just a headline that introduces further important content. People will rarely click Terms and Conditions links, after all, but they will have seen the link.

We forget that people don't read with their mouse button.

So we're going to need to have a better understanding of impressions (views) - and not just at a page level, but at an element level. Yes, we all love to have our messages, features and widgets at the top of the page, in what my high school Maths teacher called "Flashing Red Ink". However, we also have to understand that it may have to be below the fold, and there, we will need to get a better measure of how many people actually scrolled far enough to see the message - and then determine performance for those people. Fortunately, there's an abundance of tools that do this; unfortunately, we may have to do some extra work to get our numerators and denominators to align. Clicks may be currency, but they don't pay the bills.

So: segmentation - yes. Lazy segmentation - no.

Wednesday, 21 September 2022

A Quick Checklist for Good Data Visualisation

One thing I've observed during the recent pandemic is that people are now much more interested in data visualisation. Line graphs (or equivalent bar charts) have become commonplace and are being scrutinised by people who haven't looked at them since they were at school. We're seeing heatmaps more frequently, and tables of data are being shared more often than usual. This was prevalent during the pandemic, and people have generally retained their interest in data presentation (although they wouldn't call it that).

This made me consider: as data analysts and website optimisers, are we doing our best to convey our data as accurately and clearly as possible in order to make our insights actionable. We want to share information in a way that is easy to understand and easy to base decisions on, and there are some simple ways to do this (even with 'simple' data), even without glamorous new visualisation techniques.

Here's the shortlist of data visualisation rules

- Tables of data should be presented consistently either vertically or horizontally, don't mix them up
- Graphs should be either vertical bars or horizontal bars; be consistent
- If you're transferring from vertical to horizontal, then make sure that top-to-bottom matches left-to-right
- If you use colour, use it consistently and intuitively.

For example, let's consider the basic table of data: here's one from a sporting context: the English Premiership's Teams in Form: results from a series of six games.

Pos	Team	P	Pts	F	A	GD	Sequence
1	Liverpool	6	16	13	2	11	W W W W W D
2	Tottenham	6	15	10	4	6	W L W W W W
3	West Ham	6	14	17	7	10	D W W W W D

The actual data itself isn't important (unless you're a Liverpool fan), but the layout is what I'm looking at here. Let's look at the raw data layout:

Pos	Category	Metric 1	Metric 2	Metric 3	Metric 4	Derived metric	Sequence
1	Liverpool	6	16	13	2	11	W W W W W D
2	Tottenham	6	15	10	4	6	W L W W W W
3	West Ham	6	14	17	7	10	D W W W W D

The derived metric "GD" is Goal Difference, the total For minus the total Against (e.g. 13-2=11).

Here, the categories are in a column, sorted by rank, and different metrics are arranged in subsequent columns - it's standard for a league table to be shown like this, and we grasp it intuitively. Here's an example from the US, for comparison:

Player	Pass Yds	Yds/Att	Att	Cmp	Cmp %	TD	INT	Rate	1st	1st%	20+
Deshaun Watson	4823	8.9	544	382	0.702	33	7	112.4	221	0.406	69
Patrick Mahomes	4740	8.1	588	390	0.663	38	6	108.2	238	0.405	67
Tom Brady	4633	7.6	610	401	0.657	40	12	102.2	233	0.382	63

You have to understand American Football to grasp all the nuances of the data, but the principle is the same. For example, Yds/Att is yards per attempt, which is Pass Yds divided by Att. Columns of metrics, ranked vertically - in this case, by player.

A real life example of good data visualisation

Here's another example; this is taken from Next Green Car comparison tools:

The first thing you notice is that the categories are arranged in the top row, and the metrics are listed in the first column, because here we're comparing data instead of ranking them. The actual website is worth a look; it compares dozens of car performance metrics in a page that scrolls on and on. It's vertical.

When comparing data, it helps to arrange the categories like this, with the metrics in a vertical list - for a start, we're able to 'scroll' in our minds better vertically than horizontally (most books are in a portrait layout, rather than landscape).

The challenge (or the cognitive challenges) come when we ask our readers to compare data in long rows, instead of columns... and it gets more challenging if we start mixing the two layouts within the same document/presentation. In fact, challenging isn't the word. The word is confusing.

The same applies for bar charts - we generally learn to draw and interpret vertical bars in graphs, and then to do the same for horizontal bars.

Either is fine. A mixture is confusing, especially if the sequence of categories is reversed as well. We read left-to-right and top-to-bottom, and a mixture here is going to be misunderstood almost immediately, and irreversibly.

For example, this table of data (from above)

Pos	Category	Metric 1	Metric 2	Metric 3	Metric 4	Derived metric	Sequence
1	Liverpool	6	16	13	2	11	W W W W W D
2	Tottenham	6	15	10	4	6	W L W W W W
3	West Ham	6	14	17	7	10	D W W W W D

Should not be graphed like this, where the horizontal data has been converted to a vertical layout:

And it should certainly not be graphed like this: yes, the data is arranged in rows and that's remained consistent, but the sequence has been reversed! For some strange reason, this is the default layout in Excel, and it's difficult to fix.

The best way to present the tabular data in a graphical form - i.e. putting the graph into a table - is to match the layout and the sequence.

And keep this consistent across all the data points on all the slides in your presentation. You don't want your audience performing mental gymnastics to make sense of your data. It would be like reading a book, then having to turn the page by 90 degrees after a few pages, then going back again on the next page, then turning it the other way after a few more pages.

You want your audience to spend their mental power analysing and considering how to take action on your insights, and not to spend it trying to read your data.

Monday, 6 September 2021

It's Not Zero!

I started this blog many years ago. It pre-dates at least two of my children, and possibly all three - back in the days when I had time to spare, time to write and time to think of interesting topics to write about. Nowadays, it's a very different story, and I discovered that my last blog post was back in June. I used to aim for one blog article per month, so that's two full months with no digital output here (I have another blog and a YouTube channel, and they keep me busy too).

I remember those first few months, though, trying to generate some traffic for the blog (and for another one I've started more recently, and which has seen a traffic jump in the last few days).

Was my tracking code working? Was I going to be able to see which pages were getting any traffic, and where they were coming from? What was the search term (yes, this goes back to those wonderful days when Google would actually tell you your visitors' search keywords)?

I had weeks and weeks of zero traffic, except for me checking my pages. Then I discovered my first genuine user - who wasn't me - actually visiting my website. Yes, it was a hard-coded HTML website and I had dutifully copied and pasted my tag code into each page... did it work? Yes, and I could prove it: traffic wasn't zero.

So, if you're in the point (and some people are) of building out a blog, website or other online presence - or if you can remember the days when you did - remember the day that traffic wasn't zero. We all implemented the tag code at some point; or sent the first marketing email, and it's always a moment of relief when that traffic starts to appear.

Small beginnings: this is the session graph for the first ten months of 2010, for this blog. It's not filtered, and it suggests that I was visiting it occasionally to check that posts had uploaded correctly! Sometimes, it's okay to celebrate that something isn't zero any more.

And, although you didn't ask, here's the same period January-October 2020, which quietly proves that my traffic increases (through September) when I don't write new articles. Who knew?

Thursday, 24 June 2021

How long should I run my test for?

A question I've been facing more frequently recently is "How long can you run this test for?", and its close neighbour "Could you have run it for longer?"

Different testing programs have different requirements: in fact, different tests have different requirements. The test flight of the helicopter Ingenuity on Mars lasted 39.1 seconds, straight up and down. The Wright Brothers' first flight lasted 12 seconds, and covered 120 feet. Which was the more informative test? Which should have run longer?

There are various ideas around testing, but the main principle is this: test for long enough to get enough data to prove or disprove your hypothesis. If your hypothesis is weak, you may never get enough data. If you're looking for a straightforward winner/loser, then make sure you understand the concept of confidence and significance.

What is enough data? It could be 100 orders. It could be clicks on a banner : the first test recipe to reach 100 clicks - or 1,000, or 10,000 - is the winner (assuming it has a large enough lead over the other recipes).

An important limitation to consider is this: what happens if your test recipe is losing? Losing money; losing leads; losing quotes; losing video views. Can you keep running a test just to get enough data to show why it's losing? Testing suddenly becomes an expensive business, when each extra day is costing you revenue. One of the key advantages of testing over 'launch it and see' is the ability to switch the test off if it loses; how much of that advantage do you want to give up just to get more data on your test recipe?

Maybe your test recipe started badly. After all, many do: the change of experience from the normal site design to your new, all-improved, management-funded, executive-endorsed design is going to come as a shock to your loyal customers, and it's no surprise when your test recipe takes a nose-dive in performance for a few days. Or weeks. But how long can you give your design before you have to admit that it's not just the shock of the new design, (sometimes called 'confidence sickness') but that there are aspects of the new design that need to be changed before it will reach parity with your current site? A week? Two weeks? A month? Looking at data over time will help here. How was performance in week 1? Week 2? Week 3? It's possible for a test to recover, but if the initial drop was severe, then you may never recover the overall picture, but if you can find that the fourth week was actually flat (for new and return visitors) then you've found the point where users have adjusted to your new design.

If, however, the weekly gaps are widening, or staying the same, then it's time to pack up and call it a day.

Let's not forget that you probably have other tests in your pipeline which are waiting for the traffic that you're using on your test. How long can they wait until launch?

So, how long should you run your test for? As long as possible to get the data you need, and maybe longer if you can, unless it's
- suffering from confidence sickness (keep it running)
- losing badly, and consistently (unless you're prepared to pay for your test data)
- losing and holding up your testing pipeline

Similar posts I've written about online testing

Getting an online testing program off the ground
Building Momentum in Online testing
How many of your tests win?

Wright Brothers Picture:

"Released to Public: Wilber and Orville Wright with Flyer II at Huffman Prairie, 1904 (NASA GPN-2002-000126)" by pingnews.com is marked with CC PDM 1.0

Friday, 6 March 2020

Analysis versus Interpretation

We have had a disappointingly mild winter.

It snowed on two days...

You will easily notice the bias in that sentence. Friends and long-time readers will know that I love snow, for many reasons. The data from the Meteorological Office puts the winter (1 December - 29 February) into context, using a technique that I've mentioned before - ranking the specific period against the rest of the data set.

So, by any measure, it was a wet and mild winter. Far more rain than usual (across the country), and temperatures were above average.

This was posted on Facebook, a website renowned for its lack of intelligent and considered discussion, and known for the sharp-shooting debates. Was it really wetter than usual? Is global warming to blame? Is this an upward trend (there is insufficient data here) or a fluke?

And then there's the series of distraction questions - how long have records been held? Have the temperature and rainfall data been recorded since the same original date? Is any of that relevant? No.

In my experience, analysis is hard, but anybody, it seems, can carry out the interpretation. However, interpretation is wide open to personal basis, and the real skill is in treating the data impartially and without bias, and interpreting it from that viewpoint. It requires additional data research - for example, is February's data an anomaly or is it a trend? Time to go and look in the archive and support your interpretation with more data.

Thursday, 21 February 2019

One KPI too many

Three hypothetical car sales representatives are asked to focus on increasing their sales of hybrid cars for a month. They are a good cross-section of the whole sales team (which is almost 40 sales reps), and they each have their own approach. The sales advisor with the best sales figures for hybrid cars at the end of the month will receive a bonus, so there's a clear incentive to sell well. At the end of the month, the sales representatives get together with management to compare their results and confirm the winner.

Albert

Albert made no real changes to his sales style, confident that his normal sales techniques would be enough to get him through top sales spot.

Albert is, basically, our "control", which the others will be compared against. Albert is a fairly steady member of the team, and his performance is ideal for judging the performance of the other individuals. Albert sold 100 cars, of which 20 were hybrids.

Britney

Britney embraces change well, and when this incentive was introduced, she immediately made significant changes to her sales tactics. Throughout the incentive period, she went to great lengths to highlight the features and benefits of the hybrid cars. In some cases, she missed out on sales because she was pushing the hybrids so enthusiastically.

While she doesn't sell as many cars as Albert, she achieves 90 sales, of which 30 are hybrids.

Charles

Finally, Charles is the team's strongest salesman, and throughout the sales incentive month, he just sells more cars. He does this by generally pushing, chasing and selling harder to all customers, using his experience and sales skills. He doesn't really focus on selling the hybrids in particular.

Consequently, he achieves an enormous 145 sales, which includes 35 hybrid sales.

Let's summarise, and add some more metrics and KPIs (because you can never have too many, apparently...).

	Albert	Britney	Charles
Total car sales	100	90	145
Hybrid car sales	20	30	35
% Hybrid	20%	33.3%	24.1%
Total revenue	$915,000	$911,700	$913,500
Revenue per car	$9,150	$10,130	$6,300

Who did best?

1. Albert achieved the highest revenue, but only sold 20% hybrid cars.
2. Britney achieved 33% hybrid sales, but only sold 90 cars in total. She did, however, achieve the highest revenue per car (largely due to sales of the new, more expensive hybrids).
3. Charles sold 35 hybrids - the most- but only at a rate of 24.1%. He also sold generally cheaper cars (he sold 110 non-hybrid cards, and many of them were either discounted or used cars)

So which Key Performance Indicator is actually Key?

This one is often a commercial decision, based on what's more important to the business targets. Is it the volume of hybrid cars, or the percentage of them? How far could Britney's drop in overall sales be accepted before it is detrimental to overall performance? And how far could Charles's increase in overall sales be overlooked?

Sometimes, your recommendation for implementing an optimisation recipe will run into a similar dilemma. In situations like these, it pays to know which KPI is actually Key! Is it conversion? Is it volumes of PDF downloads, or is it telephone calls, chat sessions, number of pages viewed per visit or is it revenue? And how much latitude is there in calling a winner? In some situations, you won't know until you suddenly realise that your considered recommendation is not getting the warm reception you expected (but you'll start to get a feel for the Key KPIs, even if they're never actually provided by your partners).

Image credits:
Albert:  https://www.victorylaynechevrolet.com/MeetOurDepartments
Britney:  https://windsorstar.com/news/local-news/fcas-best-month-ever-as-overall-canadian-auto-sales-hit-new-record
Charles:  https://lifestyle.clickhole.com/beautiful-this-car-salesman-shaved-1-000-off-the-stic-1825120441

Car: https://www.turbosquid.com/3d-models/3d-model-of-generic-hybrid-car-simple/942292

Thursday, 22 June 2017

The General Election (Inferences from Quantitative Data)

The Election

The UK has just had a general election: all the government representatives who sit in the House of Commons have all been selected by regional votes. The UK is split into 650 areas, called constituencies, each of which has an elected Member of Parliament (MP). Each MP has been elected by voting in their constituency, and the candidate with the highest number of votes represents that constituency in the House of Commons.

There are two main political parties in the UK - the Conservative party (pursuing centre-right capitalist policies, and represented by a blue colour), and the Labour party (which pursues more socialist policies, and represented by as red colour). I'll skip the political history, and move directly to the data: the Conservative party achieved 318 MPs in the election; the Labour party achieved 262; the rest were spread between smaller parties. With 650 MPs in total, the Conservative party did not achieve a majority and have had to reach out to one of the smaller parties to reach the majority they require to obtain a working majority.

Anyway: as the results for most of the constituencies had been announced, the news reporters started their job of interviewing famous politicians of the past and present. They asked questions about what this meant for each political party; what this said about the political feeling in the country and so on.

And the Conservative politicians put a brave face on the loss of so many seats. And the Labour politicians contained their delight at gaining so many seats and preventing a Conservative majority.

The pressing issue of the day is Brexit (the UK's departure from the European Union). Some politicians said, "This tells us that the electorate don't want a 'hard' Brexit [i.e. to cut all ties completely with the EU], and that they want a softer approach." - views that they held personally, and which they thought they could infer from the election result. Others said, "This shows a vote against austerity,"; "This vote shows dissatisfaction with immigration." and so on.

The problem is: the question on election day is not, "Which of these policies do you like/dislike?" The question is, "Which of these people do you want to represent you in government?" Anything beyond that is guesswork and supposition - whether that's educated, informed, biased, or speculative.

Website Data
There's a danger in reading too much into quantitative data, and especially bringing your own bias (intentionally or unintentionally) to bear on it. Imagine on a website that 50% of people who reach your checkout don't complete their purchase. Can you say why?

- They found out how much you charge for shipping, and balked at it.
- They discovered that you do a three-for-two deal and went back to find another item, which they found much later (or not at all)
- They got called away from their computer and didn't get chance to complete the purchase
- Their mobile phone battery ran out
- They had trouble entering their credit card number

You can view the data, you can look at the other pages they viewed during their visit. You can even look at the items they had in their basket. You may be able to write hypotheses about why visitors left, but you can't say for sure. If you can design a test to study these questions, you may be able to improve your website's performance. For example, can you devise a way to show visitors your shipping costs before they reach checkout? Can you provide more contextual links to special offers such as three-for-two deals to make it easier for users to spend more money with you? Is your credit card validation working correctly? No amount of quantitative data will truly give you qualitative answers.

A word of warning: it doesn't always work out as you'd expect.

The UK, in its national referendum in June 2016, voted to leave the EU. The count was taken for each constituency, and then total number of votes was counted; the overall result was that "leave" won by 52% to 48%.

However, this varied by region, and the highest leave percentage was in Stoke-on-Trent Central, where 69% of voters opted to leave. This was identified by the United Kingdom Independence Party (UKIP) and their leader, Paul Nuttall, took the opportunity to stand as a candidate for election as an MP in the Stoke-on-Trent Central constituency in February 2017. His working hypothesis was (I assume) that voters who wanted to leave the EU would also vote for him and his party, which puts forward policies such as zero-immigration, reduced or no funding for overseas aid, and so on - very UK-centric policies that you might imagine would be consistent with people who want to leave a multi-national group. However, his hypothesis was disproved when the election results came in:

Labour Party - 7853
UKIP (Paul Nuttall) - 5233
Conservative Party - 5154
Liberal Democrat Party - 2083

He repeated his attempt in a different constituency in the General Election in June; he took 3,308 votes in Boston and Skegness - more than 10,000 fewer votes than the party's result in 2015. Shortly afterwards, he stood down as the leader of UKIP.

So, beware: inferring too much from quantitative data - especially if you have a personal bias - can leave you high and dry, in politics and in website analysis.

Web Optimisation, Maths and Puzzles

Header tag