Header tag

Friday, 17 May 2024

Multi-Armed Bandit Testing

 I have worked in A/B testing for over 12 years.  I ran my first A/B test on my own website (long since deleted and now only in pieces on a local hard-drive) about 14 years ago.  However, it has taken me this long to actually look into other ways of running online A/B tests apart from the equal 50-50 split that we all know and love.

My recent research led me to discover multi-armed bandit testing, which sounds amazing, confusing and possibly risky (don't bandits wear black eye-masks and operate outside the law??). 


The term multi-armed bandit comes from a mathematical problem, which can be phrased like this:

A gambler must choose between multiple slot machines, or "one-armed bandits", each which has a different, unknown, likelihood of winning. The aim is to find the best or most profitable outcome by a series of choices. At the beginning of the experiment, when odds and payouts are unknown, the gambler must try each one-armed bandit to measure their payout rate, and then find a strategy to maximize winnings.  


Over time, this will mean putting more money into the machine(s) which provide the best return.

Hence, the multiple one-armed bandits make this the “multi-armed bandit problem,” from which we derive multi-armed bandit testing.

The solution - to put more money into the machine which returns the best prizes most often - translates to online testing:, the testing platform dynamically changes the allocation of new test visitors to the recipes which are showing the best performance so far.  Normally, traffic is allocated randomly between the recipes, but with multi-armed bandit testing traffic is skewed towards the winning recipe(s).  Instead of the normal 50-50 split (or 25-25-25-25, or whichever), the traffic splits on a daily (or by visit) day.  

We see two phases of traffic distribution while the test is running:  initially, we have the 'exploration' phase, where the platform tests and learns, measuring which recipe(s) are providing the best performance (insert your KPI here).  After a potential winner becomes apparent, the percentage of traffic to that recipe starts to increase, while the losers see less and less traffic.  Eventually, the winner will see the vast majority of traffic - although the platform will continue to send a very small proportion of traffic to the losers, to continue to validate its measurements, and this is the 'exploitation' phase.

The graph for the traffic distribution over time may look something like this:


...where Recipe B is the winner.

So, why do a multi-armed bandit test instead of a normal A/B test?

If you need to test, learn and implement in a short period of time, then multi-armed may be the way forwards.  For example, if marketing want to know which of two or three banners should accompany the current sales campaign (back to school; Labour Day; holiday weekend), you aren't going to have time to run the test, analyze the results and push the winner.  The campaign ended while you were tinkering with your spreadsheets.  With multi-armed bandit, the platform identifies the best recipes while the test is running, and implements it while the campaign is still active.  When the campaign has ended, you will have maximized your sales performance by showing the winner while the campaign was active.







Tuesday, 14 May 2024

TV Review: Doctor Who The Space Babies and The Devil's Chord

Doctor Who needs no real introduction. My first memories of Doctor Who were posters of The Master during the early 1980s, and seeing episodes with the Psychic Circus during the mid 80s. It was deemed to scary for me to watch, and I wasn't ready to understand it either.

The Christopher Ecclestone series started the week before I got married. I watched it with great interest and thoroughly enjoyed it. I also enjoyed David Tennant's Doctor, and although I saw less as time went on, still saw some of Peter Capaldi's episodes.

I also watched Jodie Whittaker's Doctor, but it didn't really work for me. The first few episodes were so different from the Doctor Who I had seen before. With dramatic changes such as a female Doctor and new companions, it would have been helpful (if not essential) to have carried over more of the core components of Doctor Who into this series: things like the Tardis, the sonic screwdriver, familiar monsters or villains, or supporting characters (such as UNIT).  However, these were absent from many of the episodes, especially the first few, and it felt far more like watching a new TV series instead of a continuation of an existing story. Subsequently, I gave up and only watched the last few Jodie Whitaker episodes, which interestingly featured many of the major monsters and villains from classic Doctor Who.

So, what did I expect from the latest regeneration of the Doctor? The return of Russel T Davies to the helm meant I was optimistic, especially after he conclusion of the Jodie Whitaker series. I didn't like the Christmas special, which was a strange musical, but I very much enjoyed the first episode of New New Who, Space Babies. It was cute, it was snotty, full of bodily functions and toilet humour, and a scary monster that is saved at the end.  This is definitely Dr Who, not just a generic sci fi episode, and it's good.


The second episode stands in stark contrast to the first, as a real psychological thriller. The androgynous Maestro is stealing music from the whole world, starting with London in the 1960s. The villain in this episode is not only stealing music but is a serial killer too. And the Doctor? Hero of countless Dalek battles, winner of the Time Wars?  He is, as Ruby points out, a coward. What's going on?  Yes, he's camp, OK, but a coward? This is new, and will need some explaining.

I like the cause and effect thread running through these stories: if you do tread on a butterfly in this universe, you do change history. If you let the Maestro steal all the music in the 1960s, then the 2020s look very different. We've jumped from Doctor Who's parallel worlds (Father's Day, etc.) to Star Trek or Back to the Future, where it's one timeline with consequences for stepping on butterflies, and you'd better be more careful, Doctor.  


The Maestro is the Toymaker's child - there's some motivation for you - and is truly scary, and clearly powerful and motivated.  For most of the story, the episode narrowly manages to avoid becoming a musical, and a disaster, and genuinely piqued my interest.  That was, however, until the musical number at the end.  I suppose it was as inevitable as the defeat of the Maestro (who I was hoping was the latest regeneration of The Master, but never mind).  Even my daughter, who has seen a handful of Dr Who episodes, said, "What on Earth is that?" as the musical number played towards the end of the episode.  Not obviously Doctor Who, is the answer to that one.

So, two good episodes that could have been better with a few minor tweaks.  I am a little irked by the introduction of all the musicality - first in the Christmas special, and now in a regular episode.  I do like music, but I don't watch Doctor Who for it - and it takes up so much unwarranted time in the episode that it almost looks like padding or filler.

Allons-y!