Header tag

Friday, 27 January 2012

Probabilities and Free Toys: Part Three

In my previous posts, I've been looking a practical problem in probabilities.  Namely, if a breakfast cereal manufacturer gives away free toys with each cereal packet, how many cereal packets do I have to buy in order to be fairly sure (90% probability) of obtaining each toy in a set?  This of course, depends on how many toys there are in the set, and I've been crunching through the maths as far as possible for smaller numbers of toys.  


Having hit a bit of a brick wall with the algebra, I decided to turn to spreadsheet modelling, to simulate buying n boxes of cereal and seeing how many different toys t I obtained.  I did this with a macro which randomly selects a letter between A and D (four toys in a set), or between A and F (six toys) or between A and H (eight toys) and so on, and then building up a string of letters based on how many boxes I was buying.  For example, with five toys and six boxes, I might obtain:


ABBCAE


My spreadsheet would then check this result, to see if it contains A, B, C, D and E (in this case, there is no D).  However, that's only a sample size of one attempt, so I looped the macro to run for 1000 attempts, and measured the number of successes in the 1000, to get a reasonable estimate of the probability of success.


The spreadsheet is available for download here:  Probability Spreadsheet  (file-sharing website opens in new window).


And the macro, which may not make it successfully to you due to Microsoft Office's security settings, is reproduced here in full:

Sub DoctorWho()


' DoctorWho Macro
' Doctor Who toy probability calculator
' By David Leese


' Define number of toys in the full set = ntoys
' Define number of turns or boxes of cereal = nturns
ntoys = 10
nturns = 50




30 successcont = 0
'measure of success reset to zero


toys = ""
' toys is a text string which will list the letters which have been obtained
newtoy = ""
' newtoy is the randomly-generated toy to add to the list, reset here
   
   
For model = 1 To 1000
    For cont = 1 To nturns
    
        ' cont is a loop counter based on nturns
        picklett = Int((ntoys) * Rnd + 1)
        ' picklett is randomly generated value between 1 and ntoys
        
        newtoy = Chr(picklett + 64)
        ' newtoy is the letter which corresponds to picklett
        toys = toys & newtoy
        ' append the new toy to the list of existing toys


    Next cont
ActiveCell.Value = toys
ActiveCell.Offset(1, 0).Activate
toys = ""
' Insert the value of toys (the full selection) into active cell, move down for the next toy.
Next model


ActiveCell.Offset(-1000, 0).Activate
' Go back to the top of the spreadsheet


End Sub

Why is the macro names after Doctor Who?  Well, apart from working for cereal packets with toys, this also works for the current (and previous) series of Character Building Doctor Who toys, and this is where I got my first inspiration for this post (and which reminded me of the cereal packet question which I was asked at school, all those years ago).
 

 

 It also applies to Lego's Minifigures ranges...


... and to Megabloks' Marvel Superheroes figures, which are shown below.


Anyway, after that brief diversion into the various applications of this spreadsheet and these results, let's take a look at the results and explain what we're seeing.




Key features of the results:


The likelihood of obtaining a complete set grows slowly initially, where n (number of turns) is only slightly larger than t (number of toys in the set).  This feature is particularly evident for larger values of t.  For small (t < 5) numbers of toys, the increase is sharp, but as t increases, it takes longer for us to observe an increase in p.


As an example, take the results for t=10, the right-most orange line on our graph.  Even after 20 tries, the probability of getting a full set is only 20%.  Compare this with t=4 where, after 2t tries (8 tries) the probability of getting the full set was over 60%.


A second feature of the graph comes after the slow initial rise, there is a region where the gradient rises, and the probability of getting a complete set increases quickly with increasing n.  This makes sense - as you buy more and more packs, you are increasingly likely to find the toys that you're missing.  This feature continues until you reach the third phase.


In the third phase, which again only becomes evident for larger values of t, you reach the point where there's only one toy left to find, and it becomes harder and harder to become 100% certain of gaining a complete set.  At this point, the probability of obtaining a complete set gets closer and closer to 100%, but never actually reaches it.  The p=100% line is an asymptote which our results approach but never reach.  Or, to put it another way, if you haven't completed the set of 10 toys after buying 80 bags (or boxes), then buying the 81st isn't going to improve your chances by very much!


That's why there are so many websites devoted to finding, and providing, ways of identifying the toy in the bag before you buy it.  For example, an online search for "Lego minifigures codes" will point to sites that show how certain bump markings on the bags indicate the toy inside; for "Megabloks Marvel Minifigures" it's a code printed on the edge of the bag... for Doctor Who, it seems to be a case of feeling for the shapes of the figures inside.  All because the real probability of getting a complete set is extremely small - and I haven't even looked at the collections which have 'rare' or 'super rare' figures...  that's when it's time to visit eBay!

No comments:

Post a Comment