Thursday, 2 February 2012

Probabilities and Free Toys: Part Four

Well, after all the work I've done on probabilities and free toys in 'blind' packaging, such as cereal packets, or toy action figures, it seems like I've missed a neat little trick.  I was reading a maths puzzle in a book a few days ago about this exact problem!  The book in question is called Math-E-Magic, by Raymond Blum, Adam Hart-Davis, Bob Longe and Derrick Niederman.

I couldn't quite believe my luck... and then felt slightly deflated, when I reflected on all the number-crunching that I've done so far, and how I did not see this alternative approach (although it doesn't give the same degree of detail in its solution).
The puzzle is posed thus: "Cereal Serial: A cereal company places prizes in its cereal boxes.  There are four different prizes distributed evenly over all the boxes the company produces.  On average, how many boxes of cereal would you need to buy before you collected a complete set?"

This is a slightly different approach than I've been using (how many boxes to achieve 90% probability of getting a full set), which makes the maths easier, but the solution - one 'average' figure - gives no indication of the spread of results, which I achieved previously.

The answer, however, is elegantly simple.

To obtain the first toy, we need to buy just one box, as we are guaranteed to get a toy we haven't had before.  

For the second toy:  we have a probability of 3/4 of getting a new toy in the second box.  Therefore (and this is a big therefore which will need explaining separately), it'll take on average 4/3 boxes to obtain a new toy.
For the third toy, the probability of getting a new toy is 2/4 or 1/2, and therefore it will take an average of 2 boxes to get a toy we haven't had before.
And finally, for the fourth toy, the probability of getting a new toy is 1/4, and it will therefore take four additional boxes to get the final toy.

So, the average number of boxes it will take to get the four toys is:
1 + 4/3 + 2 + 4 = 8 1/3

To quote the answers from Mathemagic:  "In real life, of course, you can't buy 1/3 of a box, but that is still the average number of boxes you'd have to purchase."

As I said, I'm not familiar with the approach of taking reciprocals to obtain "number of boxes" from probabilities, although it seems intuitive.  I must have missed that lesson at school!  Let's take a look at it a little more closely.

If an event has a 1/4 probability of success, then how many times do I have to repeat it, on average, to achieve one success?  It seems overly simple to say "Four times" but that appears to be the answer - certainly, the answer given in Mathemagic works on this principle.  And logically, it makes sense.  If we were to try to measure the probability of a success event (without knowing what the probability was) then we'd take the number of successes and divide by the number of attempts.  If we were successful once in four tries, then p(success) = 1/4.  And on average, working backwards to reach the answer we're looking for, we'd expect one success in four attempts.  I will bear all this in mind next time I find a free toy inside a cereal packet... especially if it's one of a set of 16!

Let's take a look at the results from last time, and see if they can be used to confirm or support this calculation.

Firstly, here are the results with number of tries along the x-axis, and probability of success on the y-axis.  Each line represents a different number of toys in the set (two toys on the left, ten toys is the right-most).

Let's take a look at the slopes for each graph - to make things easier to see, let's just look at the cases t = 2, 3, 4 and 5.

Here are the graphs of dp/dn or how the gradients of the graphs above change with number of tries.  As I pointed out last time, the graphs have a region of maximum slope - where the probability of getting the full set of toys rises most sharply.  This is the region where we will approach the average number of boxes we need to buy, so, by finding the slope of the graph (in this case I've just subtracted adjacent figures to determine the change in probability, since I haven't got the actual function to differentiate).

What does this show us?  For toys=2, maximum slope is at n=2 (where the probability goes from zero to 50%), but the more meaningful result is at n=3 where the probability goes from 50% to 75%.  Beyond this, the slope tails off very quickly.

This result also matches the formulaic approach shown above:  
Probability of 'new' toy with one box = 1 so 1/1 =1 number of boxes to be bought.
Probability of 'new' toy with second box = 1/2, so 1/ (1/2) = 2 boxes to be bought.
1+2 = 3 
Average number of boxes for a two-toy set is three boxes.

Let's re-examine t=4, which we discussed at the start of this post.  My results graph shows a peak around n=7~8 which is in good agreement with the arithmetic figure of 8 1/3.  I'm not sure if this validates my method or the solution :-)  but it's good to see some agreement between practical experiment and calculated figures.

Finally, let's look at t=5, the purple line on my graph.
Mathematically, the average number of boxes will be:

1 + 1 1/4 + 1 2/3 + 2 1/2 + 5 = 11 5/12 or just under 11.5 boxes (11.41667).

However, the peak for t=5 on my graph is at n=9 boxes, and this (along with the results for t=4) seem to suggest an oversimplification in the calculated method that I've been using.  I just can't quite put my finger on it, or explain it clearly and concisely!  

Perhaps I've made a mistake in the analysis of my data.  Recalling my maths lessons at school, I suspect that the 'average' number of boxes is obtained when the probability of collecting the full set of toys is equal to (or slightly greater than) 50%.

Reviewing my data, this gives:

Which still shows an over-estimation by the arithmetic method (or an over-achievement by the modelling approach).  Consequently, I still feel a little concerned with the arithmetic method.  I think it's related to the probability of getting a new toy when you've already collected half the set and you're starting to get fewer successful 'new' toys, but I can't quite identify if there's a flaw in the model or in my results.  I would leave it as: when you come to find the final toy in a set of five, having already bought six boxes for the first four, is it really going to take FIVE more boxes?  Somehow, that doesn't feel right (although, as I've read previously, human beings are very poor judges of probability).

I'd really like to reach closure with this puzzle... I think I may take the mathematical approach for now, and re-examine my data at a later time!!

No comments:

Post a Comment