Friday, 27 January 2012

Probabilities and Free Toys: Part Three

In my previous posts, I've been looking a practical problem in probabilities.  Namely, if a breakfast cereal manufacturer gives away free toys with each cereal packet, how many cereal packets do I have to buy in order to be fairly sure (90% probability) of obtaining each toy in a set?  This of course, depends on how many toys there are in the set, and I've been crunching through the maths as far as possible for smaller numbers of toys.  

Having hit a bit of a brick wall with the algebra, I decided to turn to spreadsheet modelling, to simulate buying n boxes of cereal and seeing how many different toys t I obtained.  I did this with a macro which randomly selects a letter between A and D (four toys in a set), or between A and F (six toys) or between A and H (eight toys) and so on, and then building up a string of letters based on how many boxes I was buying.  For example, with five toys and six boxes, I might obtain:


My spreadsheet would then check this result, to see if it contains A, B, C, D and E (in this case, there is no D).  However, that's only a sample size of one attempt, so I looped the macro to run for 1000 attempts, and measured the number of successes in the 1000, to get a reasonable estimate of the probability of success.

The spreadsheet is available for download here:  Probability Spreadsheet  (file-sharing website opens in new window).

And the macro, which may not make it successfully to you due to Microsoft Office's security settings, is reproduced here in full:

Sub DoctorWho()

' DoctorWho Macro
' Doctor Who toy probability calculator
' By David Leese

' Define number of toys in the full set = ntoys
' Define number of turns or boxes of cereal = nturns
ntoys = 10
nturns = 50

30 successcont = 0
'measure of success reset to zero

toys = ""
' toys is a text string which will list the letters which have been obtained
newtoy = ""
' newtoy is the randomly-generated toy to add to the list, reset here
For model = 1 To 1000
    For cont = 1 To nturns
        ' cont is a loop counter based on nturns
        picklett = Int((ntoys) * Rnd + 1)
        ' picklett is randomly generated value between 1 and ntoys
        newtoy = Chr(picklett + 64)
        ' newtoy is the letter which corresponds to picklett
        toys = toys & newtoy
        ' append the new toy to the list of existing toys

    Next cont
ActiveCell.Value = toys
ActiveCell.Offset(1, 0).Activate
toys = ""
' Insert the value of toys (the full selection) into active cell, move down for the next toy.
Next model

ActiveCell.Offset(-1000, 0).Activate
' Go back to the top of the spreadsheet

End Sub

Why is the macro names after Doctor Who?  Well, apart from working for cereal packets with toys, this also works for the current (and previous) series of Character Building Doctor Who toys, and this is where I got my first inspiration for this post (and which reminded me of the cereal packet question which I was asked at school, all those years ago).


 It also applies to Lego's Minifigures ranges...

... and to Megabloks' Marvel Superheroes figures, which are shown below.

Anyway, after that brief diversion into the various applications of this spreadsheet and these results, let's take a look at the results and explain what we're seeing.

Key features of the results:

The likelihood of obtaining a complete set grows slowly initially, where n (number of turns) is only slightly larger than t (number of toys in the set).  This feature is particularly evident for larger values of t.  For small (t < 5) numbers of toys, the increase is sharp, but as t increases, it takes longer for us to observe an increase in p.

As an example, take the results for t=10, the right-most orange line on our graph.  Even after 20 tries, the probability of getting a full set is only 20%.  Compare this with t=4 where, after 2t tries (8 tries) the probability of getting the full set was over 60%.

A second feature of the graph comes after the slow initial rise, there is a region where the gradient rises, and the probability of getting a complete set increases quickly with increasing n.  This makes sense - as you buy more and more packs, you are increasingly likely to find the toys that you're missing.  This feature continues until you reach the third phase.

In the third phase, which again only becomes evident for larger values of t, you reach the point where there's only one toy left to find, and it becomes harder and harder to become 100% certain of gaining a complete set.  At this point, the probability of obtaining a complete set gets closer and closer to 100%, but never actually reaches it.  The p=100% line is an asymptote which our results approach but never reach.  Or, to put it another way, if you haven't completed the set of 10 toys after buying 80 bags (or boxes), then buying the 81st isn't going to improve your chances by very much!

That's why there are so many websites devoted to finding, and providing, ways of identifying the toy in the bag before you buy it.  For example, an online search for "Lego minifigures codes" will point to sites that show how certain bump markings on the bags indicate the toy inside; for "Megabloks Marvel Minifigures" it's a code printed on the edge of the bag... for Doctor Who, it seems to be a case of feeling for the shapes of the figures inside.  All because the real probability of getting a complete set is extremely small - and I haven't even looked at the collections which have 'rare' or 'super rare' figures...  that's when it's time to visit eBay!

Tuesday, 10 January 2012

Web Analytics: Personalisation

Last Friday night, I had to transfer some money from my savings account to my current account, and in the process encountered an interesting case of personalisation.

Withdrawing the cash from the savings at the building society was a typically anonymous matter, even though I had to provide my account passbook and photo ID, but this only became apparent when I paid the money into my bank, just across the road.  I only had to provide the money and the debit card for my bank account, but as soon as my card had been scanned, the bank clerk began addressing me as David, and just by doing that, provided a much more personal service.

Earlier in the evening, I phoned the local take-away restaurant, and on the way back from the bank, I called in to pick up my order. I'd called them from my home landline, but hadn't provided a name or address.  However, I've ordered from the take-away before, and they'd evidently stored my data: at the top of the receipt for my order were my full name and address.  As I mentioned, I hadn't provided any information at all when I phoned the order through.  Was it surprising to see my name and address on the receipt?  Absolutely. Was it un-nerving?  Perhaps, but it's more a reflection of a local business using data and information to their advantage.  I don't know if they're going to use my purchase preferences to offer me particular choices or offers next time I order... I'll let you know.

Online, I'm not surprised when Amazon, or eBay, or any other e-commerce site, uses my login details and my activity on their site to try to provide me with relevant content or advertising.  So I've been searching for a particular author, or a particular album, movie or laptop - should I really be surprised that they've noticed, and now they're using the promotional space on their sites to show me advertising of similar products?  Is this scary new technology?  Or is it something that's been around for many years, and this is just its newest incarnation?

Back when I was at high school, I had a part time job as a sales assistant at the local shoe store.  It was easy enough - serve the customers, keep the shop floor well-stocked, tidy away surplus stock into the storage room.  Part of the sales training (it wasn't extensive) was to try to cross-sell - shoelaces, polish, all that stuff, and to sell to customers when we didn't have what they wanted.  For example - "Do you have this shoe in my size?"  A quick trip to the stock room would reveal that we didn't, but a check around the shelves would show that we had it in blue, or brown instead.  Or perhaps, if it was a shoe that looked like it was for the office, did we have a similar style.  Was it good customer service?  Was it personalisation?  I would certainly hope so, as it led to me selling many pairs of shoes (and frequent declines, but that was part of the job).  Did customers question how I'd manage to come with potential alternatives?  Did they marvel at the apparent depths of the stock room, or think it was freaky or scary that I'd been able to anticipate their needs, based on just one query?

Perhaps, then, we shouldn't be surprised, or alarmed, when a computer algorithm looks at our on-site browsing habits and tries to provide us with what we appear to be searching for.

Thursday, 5 January 2012

Film Review:Tron

"User requests are what computers are for."
"Doing our business is what computers are for!"
Walter, the voice of reason, and Dillinger, the megalomaniac's voice of capitalism.

Tron could probably be described as the predecessor, or at least influential in, many films that we've seen since.  However, I haven't seen it until now.  For a so-called sci-fi fan, that's quite a confession, but it's true.  Courtesy of Lovefilm, however, that oversight has now been rectified, and I'm quite pleased with the result!

Upon first inspection, Tron is dated, and shows its age; however, the storyline and the plot have managed to remain current - in fact, any 'over-powered computer gains sentience and takes control' story probably owes its existence to Tron, and Terminator's Skynet is a prime example.  Other derivatives include the Matrix, The Net, and Hackers, to name a few.

Tron is also a great film if, like me, you like to play "What have they been in?" with the actors.  Apart from Jeff Bridges (who went on to feature in Starman, among others), Tron also features Bruce Boxleitner (Bablyon 5's John Sheridan), a very young-looking Peter Jurasik (Londa Mollari from Babylon 5, already with that unmistakeable voice), and David Warner (I recognised him as Chancellor Gorkon from Star Trek 6, The Undiscovered Country, but according to IMDB he was also in Bablyon 5 as well).

My misunderstanding of Tron led me far enough to believe that the grid-based vehicle 'game' that the occupants are forced to play was Tron; in fact, the title goes to Bruce Boxleitner's character, a rogue program introduced to cause trouble in the mainframe computer.  Yes, it's 1980s computer-speak all the way.  Otherwise, it's a CGI-fest covering a fairly straightforward adventure story... kinda reminds me of the Matrix, or Star Wars Episode 1.  It is genre-defining, it's fresh and new (for its day) and makes much of the recent stuff look derivative.  Somebody - I wish I could recall who - said that watching the later instalments of the Matrix trilogy were a lot like watching somebody else play a computer game.  There are occasional moments of that here with Tron, but these are fairly infrequent.

Overall, I liked Tron.  Yes, it's a lot of CGI and pretty graphics, but there is a story - two in fact - to be told, and I have to say that the 'real world' story was at least as interesting as the virtual one... it certainly had the more three-dimensional characters!