Header tag

Friday 17 May 2024

Multi-Armed Bandit Testing

 I have worked in A/B testing for over 12 years.  I ran my first A/B test on my own website (long since deleted and now only in pieces on a local hard-drive) about 14 years ago.  However, it has taken me this long to actually look into other ways of running online A/B tests apart from the equal 50-50 split that we all know and love.

My recent research led me to discover multi-armed bandit testing, which sounds amazing, confusing and possibly risky (don't bandits wear black eye-masks and operate outside the law??). 


The term multi-armed bandit comes from a mathematical problem, which can be phrased like this:

A gambler must choose between multiple slot machines, or "one-armed bandits", each which has a different, unknown, likelihood of winning. The aim is to find the best or most profitable outcome by a series of choices. At the beginning of the experiment, when odds and payouts are unknown, the gambler must try each one-armed bandit to measure their payout rate, and then find a strategy to maximize winnings.  


Over time, this will mean putting more money into the machine(s) which provide the best return.

Hence, the multiple one-armed bandits make this the “multi-armed bandit problem,” from which we derive multi-armed bandit testing.

The solution - to put more money into the machine which returns the best prizes most often - translates to online testing:, the testing platform dynamically changes the allocation of new test visitors to the recipes which are showing the best performance so far.  Normally, traffic is allocated randomly between the recipes, but with multi-armed bandit testing traffic is skewed towards the winning recipe(s).  Instead of the normal 50-50 split (or 25-25-25-25, or whichever), the traffic splits on a daily (or by visit) day.  

We see two phases of traffic distribution while the test is running:  initially, we have the 'exploration' phase, where the platform tests and learns, measuring which recipe(s) are providing the best performance (insert your KPI here).  After a potential winner becomes apparent, the percentage of traffic to that recipe starts to increase, while the losers see less and less traffic.  Eventually, the winner will see the vast majority of traffic - although the platform will continue to send a very small proportion of traffic to the losers, to continue to validate its measurements, and this is the 'exploitation' phase.

The graph for the traffic distribution over time may look something like this:


...where Recipe B is the winner.

So, why do a multi-armed bandit test instead of a normal A/B test?

If you need to test, learn and implement in a short period of time, then multi-armed may be the way forwards.  For example, if marketing want to know which of two or three banners should accompany the current sales campaign (back to school; Labour Day; holiday weekend), you aren't going to have time to run the test, analyze the results and push the winner.  The campaign ended while you were tinkering with your spreadsheets.  With multi-armed bandit, the platform identifies the best recipes while the test is running, and implements it while the campaign is still active.  When the campaign has ended, you will have maximized your sales performance by showing the winner while the campaign was active.







No comments:

Post a Comment