Thursday, 8 September 2011

Probabilities and Free Toys, Part 2

Last time, I looked at solving a probability question:  for a set of 3, 4, 5 or n toys which are given away free inside a cereal packet, what's the probability of obtaining the full set of toys after buying the same number of packets (e.g. for 5 toys, getting all 5 after buying 5 packs).  

This time, having solved the easier question, let's take a look at the harder question:  with 10 toys (or a number larger than three or four), how many packs do I have to buy to be 50%, 70% or 90% sure of having the full set?

Once again, let's start with two toys (start small!):

After buying two packs, the probability of success is 0.5.  The two successful combinations are AB and BA, and the two unsuccessful are AA and BB (i.e. I get the same toy twice).

After buying three packs, the probability of success rises to 6/8, which is 0.75 or 75%.
There are eight total combinations (from AAA, AAB through to BBA and BBB), but only two are unsuccessful (AAA and BBB are unsuccessful), leaving six successful combinations.

After buying four packs, the probability of success rises to 7/8.  There are 16 total combinations, but the number of unsuccessful combinations remains at two, and the number of successful combinations rises to 14.

Generally, for the two-toy problem, the probability of getting a successful combination after t turns is equal to (2^t - 2) / 2^t  or, in words, the total number of combinations minus unsuccessful, divided by the total number of combinations.

This, however, is where it gets tricky.  With three toys and three packs, there are just six successful combinations (the exact permutations, in alphabetical order, are ABC, ACB, BAC, BCA, CAB and CBA) and 27 total combinations (which confirms what I proved above for the three toys and three packs case).  With three toys and four packs, the number of successes rises to 36, and the total number combinations is 81.  The success probability is 36/81 which is 4/9, coincidentally double the probability after three packs.  After five packs, the number of combinations rises to 243.

Looking at the number of combinations I've seen so far, for the three-toy problem, it's risen from 9 to 81 to 243, rising by a factor of 3 each time, and n=3, the number of toys squared. In fact the denominator, the total number of permutations, is simply n^t (number of toys raised to the power of the number of turns or packs).  This is also known as the formula for 'permutations with repitition'.

Now I need to identify the number of successful permutations - those that contain one of each of the toys.
For n=3 toys and t=3 turns, there are 6 successful permutations (as listed above).

For n=3 toys and t=4 turns, there are 36 successful permutations

Now, the question becomes - how many will there be after five turns?

Let's go back to four turns and look closely at each of the permutations we have.  We have the 36 successful ones, ABCA, ABCB, AABC and so on.  For these 36 successes, it doesn't matter what we pick next, we'll still have a successful combination; there are three options for each of these, so that's 36x3=108 successes for each of them.

There are also the 3 combinations AAAA, BBBB and CCCC which will not produce a success, irrespective of what we pick next, so that's 3x3=9 fails.

This leaves the rest, which logically must contain two different toys.  We've covered the ones which already contain three different toys, and we've looked at the ones which contain only one toy.  Since there were 81 combinations in total after four goes, there must be 81-(36+9) = 36 combinations which contain two different toys.  There is a probability of 1/3 of the next choice being the correct one, which equals 36 x 1/3 = 12.

So, after five turns, we have 108 + 12 = 120 successful permutations.  
Let's review:

After three turns: 6 successful permutations
After four turns: 36 successful permutations
After five turns:  120 successful permutations

Let's take a look at six turns, based on the process we used for five turns.

After five turns, we have 120 successes which will each yield three more successes, so 3x120 = 360
We also still have the combinations which have just one toy in - AAAAA, BBBBB and CCCCC, and these will each produce three more unsuccessful combinations, irrespective of the next choice.  3x3=9 unsuccessful combinations.

There are 3^5 total combinations after five turns, 243 in total, which means that we have 243-(120+9) = 114 other combinations which have two toys.  A third of these will become successful with the sixth turn, which is 114/3 = 38.

So, after six turns, we have 360 + 38 = 398 successful permutations.  I've deduced the formula for working out successful permutations in an iterative manner, but I don't have the computing power to determine the 10th, 15th or 115th term without knowing each one before.  Furthermore, this method won't easily expand to cover five, six, or ten toys. It's all about knowing how many successes you've had previously, and how many certainly won't become successful (because they have n-2 different toys and will require at least two more goes to become successful).

Three toys is an easy case - you either have a successful combination, a combination with only one toy repeated, or a two-toy combination with a 1/3 chance of becoming successful.  With four toys, you may already have one, two or three different toys, or be successful, and I don't quite see how to sort all that out.

So, next time, it's on to spreadsheet modelling.  I'm going to write a macro that simulates buying the cereal packets and examining the toys, and determining whether or not the combination is a success.  If maths fails, use sampling!

Tuesday, 6 September 2011

A Beginner's Social Media Strategy

I say 'for beginners' as I don't feel particularly qualified to discuss it in much detail (as you'll soon see), and this is more an explanation of my background and experience so far.

A few years ago, I set up my own website; you may have seen me refer to it previously. It was set up entirely as an exercise in website-building and tagging - it's all hard-coded HTML. It's tagged with Google Analytics, and I've been monitoring traffic to it since then, doing a little SEO and making changes (and hopefully improvements) to the content here and there. As analysts, we're usually charged with analysing, understanding and reporting stats, and then generating recommendations from them; we're not usually given a logon to a CMS and given free run of a website. By building my own website, I got to play both sides (and realised how time-consuming content generation and site maintenance can be). Better still, I've even been running A/B/C tests on it, and finding out how easy it is to set up (providing you've got multiple content ready to serve).

Anyway, with time, I moved from focussing on the website to a blog. Blogs are, for this JavaScript beginner, much easier than HTML websites, especially when I can't use server-side includes. So, even with a WYSIWYG HTML editor, blog posts are much easier than HTML pages. Once again I found out how to tag my blog by putting javascript includes in my posts and inserting GA tags in my posts, and I've monitored the traffic. I also discovered how many analysts there are out there who are reading this blog (it's not a huge number, but compared to the single digits I was experiencing before I started writing about web analytics, it's a significant uplift). The blog has an 'about me' page which includes my Facebook and Twitter details, so that people can follow me, and I post updates about my blog on Twitter or Facebook or both, and occasionally on the Yahoo Web Analytics forum.

All of which means that I've built up a cyclical path between my social media accounts (where you can find links to my blog) and my blog (which explains how to follow me on social media).

That's not a strategy, that's just a circle. Which brings me to one key question: What am I actually trying to achieve with all this online presence?

Am I trying to get Twitter followers? Am I doing this for my ego, or for PR, or something like that? Maybe, but probably not.  Am I trying to get Facebook friends? No - I've got enough friends (and I've met 99% of them in person) and I've successfully tracked down my best friends from primary school, high school and university. Am I trying to get more people to read my blog? Now then - that seems more likely. What I'm trying to do is to share what I know about various subjects (maths, chemistry, chess, web analytics) and hopefully build up an online reputation as a reliable source of useful, accurate information - to be regarded as a specialist in my field (and possibly even, one day, an expert).

People aren't going to get that level of knowledge about me from my Twitter feed (which, even with my best intentions, is very clouded up with links to miscellaneous stuff I find interesting). They are most certainly not going to get that from my Facebook updates, which are much more personal and include family updates, photos from days out and the like, and are very much about my views and opinions and general chatter. Hopefully, though, my peers and friends will read my blog, where I write my more considered opinions and views, and share what I hope will be useful insights into areas that specifically interest me - as I said, Chess, maths, chemistry and web analytics. The blog also has a Google Analytics goal set up - visitor views the About Me page - and this means that my social media strategy not only has an aim (to get people to read the blog) but a specific goal (find out more about me and my professional skills and experience).

I could go on and build KPIs about blog traffic levels and so on, and on to Twitter followers (excluding the spam accounts) but these will be secondary to getting people to view my profile page on my blog. I can THEN use analytics to tell me which blog post they read before reading my profile, and also where they came from... and then write more blogs on similar topics and post links on similar sources.

And that, in a nutshell, is my social media strategy. I can't say that it's scalable to a company level, but I think the main points (which are probably covered elsewhere) are:

What am I actually trying to achieve with all this online presence (answer in English words)?
What is my actual aim?
What does this look like as an online event (in page terms)? Make this an online 'goal' or 'event' in analytics package.
What type of visitor carries out a success event? Which social media site did they come from?
How do I get more of them? What sort of content do they look at?

I'm not sure if this is a social media strategy, or just a reiteration of a normal online strategy. Like I said, I'm a beginner on social media strategies (despite having a blog, Facebook and Twitter accounts for years) so I'm open to other suggestions!

Probabilities and Free Toys, Part 1

Once upon a time, a long time ago, my high-school maths teacher set an extension problem: I never got chance to tackle it, and I've never tried to since.  I've remembered it through the years, as a friend of mine was able to solve it elegantly, and I never saw his answer.  So, it's time for some closure again.

Here's the question:  

A certain breakfast cereal manufacturer is giving away a free toy inside each pack.  There are ten toys in the series, and I'd like to collect them all.  However, I can't tell which toy I'm going to get when I buy the pack.  The original question was:  what's the probability of getting all ten toys after opening ten packs?  And the follow-up question I'd like to look at is:  assuming that each toy is distributed equally, how many packs would I have to buy to be 50%, 70% or 90% sure of having all ten toys?

Now, bearing in mind that this is a high-school maths problem, it shouldn't take any advanced maths to solve the first question.  The follow-up question is one of my own, and could take me anywhere.

So, let's look at the first question, and let's start with two toys and build up to ten.

After buying two packs, the probability of success is 0.5.  The two successful combinations are AB and BA, and the two unsuccessful are AA and BB (i.e. I get the same toy twice).  But let's look at that as a step-by-step process.  When I buy the first pack, I am certain of getting a toy I haven't got before.  There are two alternatives, A and B, and two successes (either of them).  So the probability is 2/2.  The probability of getting a successful toy with my second pack is 1/2.  There's now only one successful toy (the one I haven't got), but there are two toys available.  To calculate the probability of getting the first toy and the second toy in two packs is 1/2 x 2/2 = 1/2.

This can be expanded to three toys, A, B and C:
Probability of success with first pack is 3/3  (any of the toys is a success)
Probability of success with the second pack is 2/3 (I need to avoid getting a duplicate, so there are now only two successes.  If I have A, then I only need B or C).
Probability of success with the third pack is 1/3 (I now need one specific toy as I have the other two).

So, the probability of success after three packs = 3/3 x 2/3 x 1/3 = 6/27 = 2/9 = 22%

I'll do the case for four toys, before moving to a general expression:
p(success with first pack) = 4/4 as any of the four toys is a success
p(success with second pack) = 3/4 as I already have one toy, and need one of the other three
p(success with third pack) = 2/4 as I have two toys and only two are now successes
p(success with fourth pack) = 1/4 as I only need one of the four toys to complete my set.

So, probability of success with four toys and four packs =
4/4 x 3/4 x 2/4 x 1/4 = 24/256 = 3/32 = 9.375%

There's a clear pattern developing.  For five toys, the numerators will be 5, 4, 3, 2, 1 and the denominators will be 5, 5, 5, 5, 5.  The numerators are multiplied together, 5x4x3x2x1 which is called 5 factorial, and written 5! while the denominators are 5x5x5x5x5 which is 5^5.  Looking back, the same rule applies to four toys, three toys and two toys, and will apply going upwards.

So, the probability of getting all n toys with n packs is n! / n^n

n! is an expression that increases very quickly with n (1, 2, 6, 24, 120, 720, 5040 and so on) but the denominator n^n increases even more quickly (1, 4, 27, 256, 3125, 46656 and so on).  The table below shows n, n!, n^n and the ratio n! / n^n which is the probability of getting n toys with n packs.  For the original question - what's the probability of getting 10 toys in 10 packs, the answer is 10! / 10^10 which is 0.036% (less than one in a thousand).

Next time, I'll look at the harder question:  with 10 toys (or a number larger than three or four), how many packs do I have to buy to be 50%, 70% or 90% sure of having the full set?

Thursday, 1 September 2011

My X Factor predictions for 2011

Now, I reckon I'm a fairly optimistic person.  I look for the best in people and in situations, as a rule, and I will try to give people the benefit of the doubt.  However, I can also be quite cynical.  I don't see it as cynical, other people do, I see it as identifying trends and patterns and expecting them to be repeated - even though I hope for the best.

There is, however, one area where I am just plain cynical - or, alternatively, very good at spotting patterns and trends - and that is with the Saturday evening television monstrosity known as the X-Factor (which I do call the X-Factory given its aim of mass production of pop music and cardboard cut-out pop stars).

Here, then, based on previous years' viewing (despite myself) are my predictions for what we can expect from the Simon Cowell juggernaut this year.

*  At least one finalist to have estranged parent or sibling - I appreciate I'm late with this, given that immediately after the first episode, one of the judges discovered a brother she never knew she had.

*  Gary Barlow to have one of his Take That mates at the judges' house stage (and it won't be Robbie)
*  One of finalists to have been bullied at school
*  There will be the formation of a boy group and girl group, made up of the boy dregs and girl dregs at the end of the boot camp stage.  "We want to put you together into a group [because we haven't got enough groups already]."
*  These synthetic dregs-groups to go through to the live shows (you didn't think the judges would put them together and not let them go through, did you?).
*  These synthetic groups to get eliminated in first two weeks - first the girls (who will dress inappropriately) and then the boys (who can't sing as a team)
*  Simon Cowell to make a guest appearance, to much fanfare and flashing lights
*  Last year's winner (whoever that was) to release album just in time for Christmas
*  Louis Walsh to pick a wildcard act (or just a wild act) which is no good, but which secures the votes of those who deliberately vote for the worst (Jedward, Wagner).
*  There will be extensive media coverage of an apparent spat between two of the judges, probably the two ladies, but possibly the two blokes
*  One of the acts to suffer with a cough/cold/laryngitis/glandular fever part way through the TV shows
*  Two of the acts to form a 'secret' relationship, again with much media coverage

Print out the list, and tick them off.  If there any left by Christmas, I think I'll genuinely be surprised.  In fact, give me a week or two, and I'll probably have some more predictions.