Header tag

Wednesday 21 September 2022

A Quick Checklist for Good Data Visualisation

One thing I've observed during the recent pandemic is that people are now much more interested in data visualisation.  Line graphs (or equivalent bar charts) have become commonplace and are being scrutinised by people who haven't looked at them since they were at school.  We're seeing heatmaps more frequently, and tables of data are being shared more often than usual.  This was prevalent during the pandemic, and people have generally retained their interest in data presentation (although they wouldn't call it that).

This made me consider:  as data analysts and website optimisers, are we doing our best to convey our data as accurately and clearly as possible.  We want to share information in a way that is easy to understand and easy to base decisions on, and there are some simple ways to do this (even with 'simple' data), even without glamorous new visualisation techniques.

Here's the shortlist:

- Tables of data should be presented consistently either vertically or horizontally, don't mix them up
- Graphs should be either vertical bars or horizontal bars; be consistent
- If you're transferring from vertical to horizontal, then make sure that top-to-bottom matches left-to-right
- If you use colour, use it consistently and intuitively.

For example, let's consider the basic table of data:  here's one from a sporting context:  the English Premiership's Teams in Form:  results from a series of six games.

PosTeamPPtsFAGDSequence
1Liverpool61613211W W W W W D
2Tottenham6151046W L W W W W
3West Ham61417710D W W W W D

The actual data itself isn't important (unless you're a Liverpool fan), but the layout is what I'm looking at here.  Let's look at the raw data layout:

PosCategory
Metric
1
Metric
2
Metric
3
Metric
4
Derived
metric
Sequence
1Liverpool61613211W W W W W D
2Tottenham6151046W L W W W W
3West Ham61417710D W W W W D


The derived metric "GD" is Goal Difference, the total For minus the total Against (e.g. 13-2=11).

Here, the categories are in a column, sorted by rank, and different metrics are arranged in subsequent columns - it's standard for a league table to be shown like this, and we grasp it intuitively.  Here's an example from the US, for comparison:

PlayerPass YdsYds/AttAttCmpCmp %TDINTRate1st1st%20+
Deshaun Watson48238.95443820.702337112.42210.40669
Patrick Mahomes47408.15883900.663386108.22380.40567
Tom Brady46337.66104010.6574012102.22330.38263


You have to understand American Football to grasp all the nuances of the data, but the principle is the same.   For example, Yds/Att is yards per attempt, which is Pass Yds divided by Att.  Columns of metrics, ranked vertically - in this case, by player.

Here's another example; this is taken from Next Green Car comparison tools:


The first thing you notice is that the categories are arranged in the top row, and the metrics are listed in the first column, because here we're comparing data instead of ranking them.  The actual website is worth a look; it compares dozens of car performance metrics in a page that scrolls on and on.  It's vertical.

When comparing data, it helps to arrange the categories like this, with the metrics in a vertical list - for a start, we're able to 'scroll' in our minds better vertically than horizontally (most books are in a portrait layout, rather than landscape).

The challenge (or the cognitive challenges) come when we ask our readers to compare data in long rows, instead of columns... and it gets more challenging if we start mixing the two layouts within the same document/presentation.  In fact, challenging isn't the word. The word is confusing.

The same applies for bar charts - we generally learn to draw and interpret vertical bars in graphs, and then to do the same for horizontal bars.

Either is fine. A mixture is confusing, especially if the sequence of categories is reversed as well. We read left-to-right and top-to-bottom, and a mixture here is going to be misunderstood almost immediately, and irreversibly.

For example, this table of data (from above)

PosCategory
Metric
1
Metric
2
Metric
3
Metric
4
Derived
metric
Sequence
1Liverpool61613211W W W W W D
2Tottenham6151046W L W W W W
3West Ham61417710D W W W W D


Should not be graphed like this, where the horizontal data has been converted to a vertical layout:
And it should certainly not be graphed like this:  yes, the data is arranged in rows and that's remained consistent, but the sequence has been reversed!  For some strange reason, this is the default layout in Excel, and it's difficult to fix.


The best way to present the tabular data in a graphical form - i.e. putting the graph into a table - is to match the layout and the sequence.

And keep this consistent across all the data points on all the slides in your presentation.  You don't want your audience performing mental gymnastics to make sense of your data.  It would be like reading a book, then having to turn the page by 90 degrees after a few pages, then going back again on the next page, then turning it the other way after a few more pages.  

You want your audience to spend their mental power analysing and considering how to take action on your insights, and not to spend it trying to read your data.