Header tag

Friday, 27 January 2012

Probabilities and Free Toys: Part Three

In my previous posts, I've been looking a practical problem in probabilities.  Namely, if a breakfast cereal manufacturer gives away free toys with each cereal packet, how many cereal packets do I have to buy in order to be fairly sure (90% probability) of obtaining each toy in a set?  This of course, depends on how many toys there are in the set, and I've been crunching through the maths as far as possible for smaller numbers of toys.  


Having hit a bit of a brick wall with the algebra, I decided to turn to spreadsheet modelling, to simulate buying n boxes of cereal and seeing how many different toys t I obtained.  I did this with a macro which randomly selects a letter between A and D (four toys in a set), or between A and F (six toys) or between A and H (eight toys) and so on, and then building up a string of letters based on how many boxes I was buying.  For example, with five toys and six boxes, I might obtain:


ABBCAE


My spreadsheet would then check this result, to see if it contains A, B, C, D and E (in this case, there is no D).  However, that's only a sample size of one attempt, so I looped the macro to run for 1000 attempts, and measured the number of successes in the 1000, to get a reasonable estimate of the probability of success.


The spreadsheet is available for download here:  Probability Spreadsheet  (file-sharing website opens in new window).


And the macro, which may not make it successfully to you due to Microsoft Office's security settings, is reproduced here in full:

Sub DoctorWho()


' DoctorWho Macro
' Doctor Who toy probability calculator
' By David Leese


' Define number of toys in the full set = ntoys
' Define number of turns or boxes of cereal = nturns
ntoys = 10
nturns = 50




30 successcont = 0
'measure of success reset to zero


toys = ""
' toys is a text string which will list the letters which have been obtained
newtoy = ""
' newtoy is the randomly-generated toy to add to the list, reset here
   
   
For model = 1 To 1000
    For cont = 1 To nturns
    
        ' cont is a loop counter based on nturns
        picklett = Int((ntoys) * Rnd + 1)
        ' picklett is randomly generated value between 1 and ntoys
        
        newtoy = Chr(picklett + 64)
        ' newtoy is the letter which corresponds to picklett
        toys = toys & newtoy
        ' append the new toy to the list of existing toys


    Next cont
ActiveCell.Value = toys
ActiveCell.Offset(1, 0).Activate
toys = ""
' Insert the value of toys (the full selection) into active cell, move down for the next toy.
Next model


ActiveCell.Offset(-1000, 0).Activate
' Go back to the top of the spreadsheet


End Sub


Why is the macro named after Doctor Who?  Well, apart from working for cereal packets with toys, this also works for the current (and previous) series of Character Building Doctor Who toys, and this is where I got my first inspiration for this post (and which reminded me of the cereal packet question which I was asked at school, all those years ago).
 

 

 It also applies to Lego's Minifigures ranges...


... and to Megabloks' Marvel Superheroes figures, which are shown below.


Anyway, after that brief diversion into the various applications of this spreadsheet and these results, let's take a look at the results and explain what we're seeing.




Key features of the results:


The likelihood of obtaining a complete set grows slowly initially, where n (number of turns) is only slightly larger than t (number of toys in the set).  This feature is particularly evident for larger values of t.  For small (t < 5) numbers of toys, the increase is sharp, but as t increases, it takes longer for us to observe an increase in p.


As an example, take the results for t=10, the right-most orange line on our graph.  Even after 20 tries, the probability of getting a full set is only 20%.  Compare this with t=4 where, after 2t tries (8 tries) the probability of getting the full set was over 60%.


A second feature of the graph comes after the slow initial rise, there is a region where the gradient rises, and the probability of getting a complete set increases quickly with increasing n.  This makes sense - as you buy more and more packs, you are increasingly likely to find the toys that you're missing.  This feature continues until you reach the third phase.


In the third phase, which again only becomes evident for larger values of t, you reach the point where there's only one toy left to find, and it becomes harder and harder to become 100% certain of gaining a complete set.  At this point, the probability of obtaining a complete set gets closer and closer to 100%, but never actually reaches it.  The p=100% line is an asymptote which our results approach but never reach.  Or, to put it another way, if you haven't completed the set of 10 toys after buying 80 bags (or boxes), then buying the 81st isn't going to improve your chances by very much!


That's why there are so many websites devoted to finding, and providing, ways of identifying the toy in the bag before you buy it.  For example, an online 
search for "Lego minifigures codes" will point to sites that show how certain bump markings on the bags indicate the toy inside; for "Megabloks Marvel Minifigures" it's a code printed on the edge of the bag... for Doctor Who, it seems to be a case of feeling for the shapes of the figures inside.  All because the real probability of getting a complete set is extremely small - and I haven't even looked at the collections which have 'rare' or 'super rare' figures...  that's when it's time to visit eBay!

The Probabilities and Free Toys Series

Part 1:  Solving for "What's the probability of getting 10 toys in just 10 packs?"
Part 2:  Solving for "How many packs do I need to buy to be 50%, 70%, 90% sure of getting all the toys?" 



Tuesday, 10 January 2012

Web Analytics: Personalisation

Last Friday night, I had to transfer some money from my savings account to my current account, and in the process encountered an interesting case of personalisation.


Withdrawing the cash from the savings at the building society was a typically anonymous matter, even though I had to provide my account passbook and photo ID, but this only became apparent when I paid the money into my bank, just across the road.  I only had to provide the money and the debit card for my bank account, but as soon as my card had been scanned, the bank clerk began addressing me as David, and just by doing that, provided a much more personal service.


Earlier in the evening, I phoned the local take-away restaurant, and on the way back from the bank, I called in to pick up my order. I'd called them from my home landline, but hadn't provided a name or address.  However, I've ordered from the take-away before, and they'd evidently stored my data: at the top of the receipt for my order were my full name and address.  As I mentioned, I hadn't provided any information at all when I phoned the order through.  Was it surprising to see my name and address on the receipt?  Absolutely. Was it un-nerving?  Perhaps, but it's more a reflection of a local business using data and information to their advantage.  I don't know if they're going to use my purchase preferences to offer me particular choices or offers next time I order... I'll let you know.


Online, I'm not surprised when Amazon, or eBay, or any other e-commerce site, uses my login details and my activity on their site to try to provide me with relevant content or advertising.  So I've been searching for a particular author, or a particular album, movie or laptop - should I really be surprised that they've noticed, and now they're using the promotional space on their sites to show me advertising of similar products?  Is this scary new technology?  Or is it something that's been around for many years, and this is just its newest incarnation?


Back when I was at high school, I had a part time job as a sales assistant at the local shoe store.  It was easy enough - serve the customers, keep the shop floor well-stocked, tidy away surplus stock into the storage room.  Part of the sales training (it wasn't extensive) was to try to cross-sell - shoelaces, polish, all that stuff, and to sell to customers when we didn't have what they wanted.  For example - "Do you have this shoe in my size?"  A quick trip to the stock room would reveal that we didn't, but a check around the shelves would show that we had it in blue, or brown instead.  Or perhaps, if it was a shoe that looked like it was for the office, did we have a similar style.  Was it good customer service?  Was it personalisation?  I would certainly hope so, as it led to me selling many pairs of shoes (and frequent declines, but that was part of the job).  Did customers question how I'd manage to come with potential alternatives?  Did they marvel at the apparent depths of the stock room, or think it was freaky or scary that I'd been able to anticipate their needs, based on just one query?


Perhaps, then, we shouldn't be surprised, or alarmed, when a computer algorithm looks at our on-site browsing habits and tries to provide us with what we appear to be searching for.

Thursday, 5 January 2012

Film Review:Tron

"User requests are what computers are for."
"Doing our business is what computers are for!"
Walter, the voice of reason, and Dillinger, the megalomaniac's voice of capitalism.




Tron could probably be described as the predecessor, or at least influential in, many films that we've seen since.  However, I haven't seen it until now.  For a so-called sci-fi fan, that's quite a confession, but it's true.  Courtesy of Lovefilm, however, that oversight has now been rectified, and I'm quite pleased with the result!




Upon first inspection, Tron is dated, and shows its age; however, the storyline and the plot have managed to remain current - in fact, any 'over-powered computer gains sentience and takes control' story probably owes its existence to Tron, and Terminator's Skynet is a prime example.  Other derivatives include the Matrix, The Net, and Hackers, to name a few.


Tron is also a great film if, like me, you like to play "What have they been in?" with the actors.  Apart from Jeff Bridges (who went on to feature in Starman, among others), Tron also features Bruce Boxleitner (Bablyon 5's John Sheridan), a very young-looking Peter Jurasik (Londa Mollari from Babylon 5, already with that unmistakeable voice), and David Warner (I recognised him as Chancellor Gorkon from Star Trek 6, The Undiscovered Country, but according to IMDB he was also in Bablyon 5 as well).


My misunderstanding of Tron led me far enough to believe that the grid-based vehicle 'game' that the occupants are forced to play was Tron; in fact, the title goes to Bruce Boxleitner's character, a rogue program introduced to cause trouble in the mainframe computer.  Yes, it's 1980s computer-speak all the way.  Otherwise, it's a CGI-fest covering a fairly straightforward adventure story... kinda reminds me of the Matrix, or Star Wars Episode 1.  It is genre-defining, it's fresh and new (for its day) and makes much of the recent stuff look derivative.  Somebody - I wish I could recall who - said that watching the later instalments of the Matrix trilogy were a lot like watching somebody else play a computer game.  There are occasional moments of that here with Tron, but these are fairly infrequent.


Overall, I liked Tron.  Yes, it's a lot of CGI and pretty graphics, but there is a story - two in fact - to be told, and I have to say that the 'real world' story was at least as interesting as the virtual one... it certainly had the more three-dimensional characters!

Saturday, 26 November 2011

Too Big Data?

Apparently, we are in the information age.  The Stone Age has passed, so has the Industrial Age and all that went with it.  Information is the new tool du jour, with vast quantities being produced, recorded, stored, analysed and picked apart, reconsistuted and reworked.  According to various internet sources (which are, as a type, notoriously unreliable), the current information age is unlike anything previously, with the potential to change the world (if it hasn't already).





But who's to say that any of this data is actually useful?  We may well be producing unprecedented volumes of data now, but that's only because anybody with an internet connection and a text editor can produce a blog (look at me).  Courtesy of this wonderful information age, anybody can produce a poorly-spelt, badly-punctuated and grammatically incorrect blog. 

Unfortunately, no storage system, whether it's a 5.25" floppy disk and drive, or a magneto-optical drive, or a CD-ROM or a USB memory pen or a web server, can determine the difference between quality data and meangingless drivel.  It's all stored, counted, analysed and so on.  All that we've done is provide anybody and everybody the opportunity to record the data that they had in their heads, and have it stored, and then displayed.  It's easy.  In fact, it's too easy.  I would venture that if Shakespeare had access to a blog, he would never have written to the high quality that he achieved with paper and quill.  The very act of getting ink onto paper (two substances that, despite our information age are still no closer to obsolescence) required time and thought, and his words were crafted.  Consider the time taken to create a cave painting.


Or how about the labour intensive process of hammering characters into stone tablets?  Now, I can sit here and hammer my fingers on an iPad with no real plan, producing sentence after sentence of data that will become stored, recorded and so on.  No wonder the latest craze is 'big data'.  Even if we separate the meaningful from the meaningless, the meaningful - and even the borderline cases - will require vast amounts of storage.  Do we really want to know what the girl next door had for breakfast?  Do her status updates on Facebook count as data?  Yes, they do, so no wonder we're producing more data than ever before... we're setting a pretty low bar on 'data', after all.  So, no wonder we've got big data - it's too big data if we aren't going to be discriminatory, or even selective.


As an aside, I do try to produce quality material in my blog (the web analytics, maths and science stuff especially; the film reviews less so, and the X-factor rants less so again).  I figure there's plenty of data out there, so I'm also trying to keep things fairly concise.


So, from this standpoint, I hope you'll forgive my cynicism when I hear that we are now producing more data in two days (or a year, or whatever) than was ever produced in the previous 4000 years.  We are also producing more waste, releasing more carbon dioxide, and more and more television channels than ever before.  Volume is not everything.  Quantity alone is a meaningless metric - as many in the web analytics area have pointed out before, traffic by itself is not a valuable KPI.  Which would you rather have, 10 tonnes of coal, or 10 grams of diamond?

Wednesday, 26 October 2011

Chess game: Sicilian Blunders

How not to play the Sicilian as White... .made a complete waste of my white-squared bishop, and got into all sorts of trouble with reckless pawn advances and finished off by not protecting my king.


Worse still, I isolated my king behind a doubled-pawn position and could not fight off black's direct attack.  The game ended very quickly after that.
Let's take a look at my biggest mistakes in this game (I'm sure they're meant to be called learning points).
By move 15, I've completely isolated my white-squared bishop.  I should surely have moved it back to c2 on move 13, to give it some hope of remaining in the game.


16. Be3 shows what a wasted move 9 Bd2 was.  I should have been more decisive earlier in the game.
18. d4 was a vast mistake.  I should have left the pawns blocked up in the middle.  As it was, I then decided to ditch my bishop (another mistake) and by move 23 my opponent has mobilised his pieces and is already hitting all the weaknesses in my pawns (and there are plenty to aim for).


24. Rc3 was a mistake.  Yes, it protects the pawns (although rooks should probably never have this duty at this point in the game), but it would have been better for me to play Rfd1 and provide my king with a way out.


From this point on, my pawns on f2 and f3 block my pieces from defending my king, and it's just a matter of time...!

Friday, 21 October 2011

Film review: The Green Hornet

THE GREEN HORNET (2011)

I can't honestly say why I put this on our Lovefilm list... I think it was a 'recent release' and, since I like superhero movies as a genre, I thought I'd put it on the list and give it a try.  Partly that, and partly that the list was getting a bit low and needing topping up.  Consequently, I had no idea of the Green Hornet's back story, and part way through the film, came to the conclusion that the whole thing was a parody of Batman (rich young bachelor with more money decides to fight crime with the aid of his trusty sidekick and some incredible gadgets).  It is only today, a couple of days after seeing the film, that I've just seen an episode of the 1960s Batman TV series and realised that the Green Hornet really is a genuine 'superhero' character.




In his appearance in the TV series, the Green Hornet appears to act as a supervillian, while acting secretly as a crime-fighter.  They've managed to carry this into the movie adaptation:  he's pretending to be a villian, while actually working to fight crime.  It's easier to see than to explain, but the Green Hornet decides to take over Los Angeles' criminal operations, with the aim of bringing them down in an illegal manner:  cue lots of shooting, explosions and so on, all done in true comic-book style.  Consequently, as his partner points out, they have the police AND the criminals chasing after them.


As complete novices in the criminal and crime-fighting worlds, the Green Hornet and Kato realise that they need help, advice and basically to be told what to do.  This comes in the form of Britt Reid's newly-hired personal assistant, Lenore Case, played by Cameron Diaz, an expert criminologist.  The would-be love triangle between Brett, Kato and Lenore is played for some very amusing scenes, and becomes a point of conflict between the would-be heroes.


Starting with the comical concept of pretending to be criminals, but working to fight crime, this film has some extremely amusing points, interspersed with some very funny scenes, and there were various points that had me laughing out loud.  There are plenty of gadgets - I'm sure many of these are based on the Green Hornet's history, so I apologise that I've no idea how relevant they are - there's the heroes' ineptitude played for laughs (in fact, a lengthy fight scene between Kato and Britt is filmed in madcap slapstick style - there's no missing what the directors were going for); and there's an extremely long car chase, involving a car getting stuck in a printing press and subsequently being driven around and office... minus its rear wheels.


There's the obligatory scene where Britt realises that his workaholic, distant father was actually a good man, working to expose a devious plot between criminals and politicians, and subsequently acts to restore his father's good name (and put the head back on his statue, but that's a whole other story), but it's deliberately played down as a serious emotional scene and is kept in line with the pacey comedy of the rest of the film.  To be honest, the whole film really does play as a parody of Batman, so I can't comment on how accurate it is to the original TV or radio series.


I'd like to discuss the final scene in the film, but I can't in too much detail (it would truly spoil the ending) but it involves Britt requiring treatment for a gunshot wound... except he's in too much pain to think rationally.  The ending is entirely in keeping with the story, and also opens the way for a possible sequel.


There are plenty of high-profile actors in supporting roles, which works well as I can't say I've ever seen the lead actors Seth Rogen (Britt Reid) and Jay Chou (Kato) in anything before.  James Franco (Harry Osborn in the Spiderman films) has a short role as a would-be crime boss, Edward James Olmos (Admiral Adama from Battlestar Galactica) features as the editor of Reid's newspaper, the Sentinel, and Tom Wilkinson (Batman Begins, Duplicity, The Full Monty) plays Reid's father.  As I mentioned, Cameron Diaz stars as Reid's personal assistant, and her experience playing comedy definitely helps here.  Everything is comic-book larger-than-life, but somehow it avoids being excessive and while completely unrealistic, manages to carry enough realism (just) to be very funny and engaging as a story.  I like.

Friday, 14 October 2011

X Factor Predictions Revisited and Updated

A few weeks ago, I listed a number of predictions about the X-Factor 2011, and here they are (so far) listed in bold with my comments.

*  At least one finalist to have estranged parent or sibling - I appreciate I'm late with this, given that immediately after the first episode, one of the judges discovered a brother she never knew she had.
This hasn't been uncovered yet, but give it time.  Let's not forget that a few weeks ago, Tulisa's long lost brother told us all about her childhood and upbringing.

*  Gary Barlow to have one of his Take That mates at the judges' house stage (and it won't be Robbie)
So that's score +1 for the Take That predicition, but -1 for suggesting it wouldn't be Robbie.  You win some, you lose some (a philosophy that might be of use to all those 'this is life or death for me' contestants).

*  One of finalists to have been bullied at school
We'll see...

*  There will be the formation of a boy group and girl group, made up of the boy dregs and girl dregs at the end of the boot camp stage.  "We want to put you together into a group [because we haven't got enough groups already]."
Yes.  It was an easy one, but it was worth mentioning.

*  These synthetic dregs-groups to go through to the live shows (you didn't think the judges would put them together and not let them go through, did you?).
And again, I was correct.  Too easy, really, but hey, some points are worth getting.

*  These synthetic groups to get eliminated in first two weeks - first the girls (who will dress inappropriately) and then the boys (who can't sing as a team)
I should qualify that I was expecting the public vote to start in week one, so give me another week here, folks!

*  Simon Cowell to make a guest appearance, to much fanfare and flashing lights
Still pending.

*  Last year's winner (whoever that was) to release album just in time for Christmas
Oh goodness me, was that really Matt Cardle hawking his new single last Sunday?  Really?  Imagine that.

*  Louis Walsh to pick a wildcard act (or just a wild act) which is no good, but which secures the votes of those who deliberately vote for the worst (Jedward, Wagner).
And this year, he's called Johnny.  It would have been Goldie too but she had the sense to leave.

*  There will be extensive media coverage of an apparent spat between two of the judges, probably the two ladies, but possibly the two blokes
Still waiting for this one, although the drama over "head judge" on The Xtra Factor was a parody in and of itself.  Nice one, ITV2.

*  One of the acts to suffer with a cough/cold/laryngitis/glandular fever part way through the TV shows
Still pending... just give them a few weeks.

*  Two of the acts to form a 'secret' relationship, again with much media coverage
I haven't worked out who this will be yet, but give me a week or so and I'll be able to suggest names.

Now for my additional predictions, having seen week one.

I wish I'd mentioned the excessive use of Orff's Carmina Burana classical piece every time something interesting (or dull) happens... like the judges walk on stage... or walk off the stage...  otherwise, I'd predict that they'd use it.

The synthetic girl group (which I've mentioned before) to wear excessively revealing clothes, and then to draw criticism for it, and then, in the same week, to get voted off.

Michael Jackson week.

Some disastrous cover versions of Take That and Westlife songs.


The judges to criticise each others' acts' "song selections" and "fashion sense" instead of the singing.

Movie Soundtrack Week.

One of the judges, probably Louis, to bend the rules on the allowed songs for "Movie Soundtrack Week" and pick a popular song that featured in an obscure movie.


One of the judges to say, "I think you could really be at risk this week," as a transparent ploy to get people to ring up and vote.  Seriously?  Do you think that the sales of the CDs and downloads comes close to the total phone revenue for the X-factor?  It's all about persuading, cajoling and manipulating people into voting.  I might start a whole other post on the manipulation of the public (and the public vote) by the judges' comments - "People really need to pick up the phone and vote for you this week" being a less subtle one, and "I think you're at risk" being slightly less obvious.


Deadlock.  Every week that it's possible, the judges will deploy deadlock instead of actually kicking off the weaker act.  Remember Jedward, hmm, and their excessively long stay on the show at the expense of acts who could sing better but didn't generate the same interest or phone votes?

I also wish I'd remembered that in the first few weeks, one of the judges usually makes a disastrous song choice and immediately dooms their act.  In 2007, it was Daniel DeBourg it was "Build Me Up Buttercup", this year it was Jonjo with "You Really Got Me."  Talk about a rabbit in the headlights (and in order to discuss rabbit in the headlights, I'd like to talk more about the ever-pale-faced Leon Jackson).

I predict that they'll release "The Charity Single" ... the money spinner now available with with 'extra' goodwill.

That's all for this time, so, keep watching (but not voting) until next time, when we'll probably discuss the allegations of phone-lines being rigged, of judges deliberately throwing their acts to the lions, and of not eliminating the right acts.