Saturday, 26 November 2011

Too Big Data?

Apparently, we are in the information age.  The Stone Age has passed, so has the Industrial Age and all that went with it.  Information is the new tool du jour, with vast quantities being produced, recorded, stored, analysed and picked apart, reconsistuted and reworked.  According to various internet sources (which are, as a type, notoriously unreliable), the current information age is unlike anything previously, with the potential to change the world (if it hasn't already).





But who's to say that any of this data is actually useful?  We may well be producing unprecedented volumes of data now, but that's only because anybody with an internet connection and a text editor can produce a blog (look at me).  Courtesy of this wonderful information age, anybody can produce a poorly-spelt, badly-punctuated and grammatically incorrect blog. 

Unfortunately, no storage system, whether it's a 5.25" floppy disk and drive, or a magneto-optical drive, or a CD-ROM or a USB memory pen or a web server, can determine the difference between quality data and meangingless drivel.  It's all stored, counted, analysed and so on.  All that we've done is provide anybody and everybody the opportunity to record the data that they had in their heads, and have it stored, and then displayed.  It's easy.  In fact, it's too easy.  I would venture that if Shakespeare had access to a blog, he would never have written to the high quality that he achieved with paper and quill.  The very act of getting ink onto paper (two substances that, despite our information age are still no closer to obsolescence) required time and thought, and his words were crafted.  Consider the time taken to create a cave painting.


Or how about the labour intensive process of hammering characters into stone tablets?  Now, I can sit here and hammer my fingers on an iPad with no real plan, producing sentence after sentence of data that will become stored, recorded and so on.  No wonder the latest craze is 'big data'.  Even if we separate the meaningful from the meaningless, the meaningful - and even the borderline cases - will require vast amounts of storage.  Do we really want to know what the girl next door had for breakfast?  Do her status updates on Facebook count as data?  Yes, they do, so no wonder we're producing more data than ever before... we're setting a pretty low bar on 'data', after all.  So, no wonder we've got big data - it's too big data if we aren't going to be discriminatory, or even selective.


As an aside, I do try to produce quality material in my blog (the web analytics, maths and science stuff especially; the film reviews less so, and the X-factor rants less so again).  I figure there's plenty of data out there, so I'm also trying to keep things fairly concise.


So, from this standpoint, I hope you'll forgive my cynicism when I hear that we are now producing more data in two days (or a year, or whatever) than was ever produced in the previous 4000 years.  We are also producing more waste, releasing more carbon dioxide, and more and more television channels than ever before.  Volume is not everything.  Quantity alone is a meaningless metric - as many in the web analytics area have pointed out before, traffic by itself is not a valuable KPI.  Which would you rather have, 10 tonnes of coal, or 10 grams of diamond?