Header tag

Wednesday 10 January 2024

Statistics: Type 1 and Type 2 Errors

 In statistics (and by extension, in testing), a Type I error is a false positive conclusion (we think a test recipe won when it didn't), while a Type II error is a false negative conclusion (we think the test recipe lost, when it didn't).  

Making a statistical decision always involves uncertainties, because we're sampling instead of looking at the whole population.  This means the risks of making these errors are unavoidable in hypothesis testing - we don't know everything because we can't measure everything.  However, that doesn't mean we don't know anything - it just means we need to understand what we do and don't know.


The probability of making a Type I error is the significance level, or alpha (α), while the probability of making a Type II error is beta (β).  Incidentally, the statistical power of a test is measured by 1- β.  I'll be looking at the statistical power of a test in a future blog.

These risks can be minimized through careful planning in your test design.

To reduce Type 1 errors, which mean falsely rejecting the null hypothesis - and calling a winner when the results were flat - it is crucial to choose an appropriate significance level and stick to it. Being cautious when interpreting results and also considering what the findings mean may also help mitigate Type 1 errors.  Different companies have different significance levels that they use when testing, depending on how cautious or ambitious they want to be with their testing program.  If there are millions of dollars at risk per year, or developing a new site or design will cost months of work, then adopting a higher significance level (90% or higher) may be the order of the day.  Conversely, if you're a smaller operator with less traffic, or a change that can be easily unpicked if things don't go as expected, then you could use a lower significance level (80% or higher).

It's worth saying at this point that human beings are lousy at understanding and interpreting probabilities, and that's generally.  Confidence levels and probabilities are related but are not directly interchangeable.  The difference in confidence between 90% and 80% is not the same as between 80% and 70%.  It becomes more and more 'difficult' to increase a confidence level as you approach 100% confidence.  After all, can you really say something is 100% certain to happen when you've only taken a sample (even if it's a really large sample)?  On the other hand, it's easy to the point of inevitable that a small sample can give you a 50% confidence level.  What did you prove?  That a coin is equally likely to give you heads or tails?


 Type 2 errors can be minimised by using high levels of statistical significance, or (unsurprisingly) by using a larger sample size.  The sample size determines the degree of sampling error, which in turn sets the ability to detect the differences in a hypothesis test. A larger sample size increases the chances to capture the differences in the statistical tests, and also increases a test's power. 

Practically speaking, Type 1 and Type 2 errors (false positives and false negatives) are an inherent feature of A/B testing, and the best ways to minimize them is to have a pre-agreed minimum sample size, and a pre-determined confidence level that everyone (business teams, marketing, testing team) are all agreed on.  Otherwise, there'll be discussions and debates afterwards about what's a winner, what's confident, what's significant and what's actually a winner.  

Thursday 28 December 2023

Firsts of 2023

 

I've written these "firsts" for the last two years now, this being the third.  Even in my mid forties I'm doing things for the first time:

First time I saw a deer in the wild - while on holiday in Devon.  We were staying in a secluded bungalow - actually a converted horse stable- which overlooked some fields, and one morning saw a wild deer walking across the field.  We're going back in 2024, and this time I'll have my camera on permanent standby!

First time dressed up as Captain Picard... and also as Hannibal from the A Team. I've decided to make my grey hair and receding hairline work for me and embrace it.  I mean - Darth Vader is still first choice, but I'm diversifying. The Hannibal costume needs work - I went with a friend who dressed as Murdock and people still couldn't tell who we were.  This meant this was the first year I dressed up as four different characters: Vader, Batman, Hannibal and Picard.

Hosted a colony of bees under a loose roof tile. This was during the summer, when a colony of bumblebees borrowed some of our roof space for about three weeks.  If you know me, you'll know I like bees, but this was a little too close for comfort.  They were very well behaved and kept themselves to themselves but it was still a little disconcerting and we'll get the tile fixed in spring.

First time I was in a group that was nominated for a community award.  I unofficially joined our local Star Wars/Cosplay group, the Endon Stormtroopers, in 2023 and in 2024 was made official (I have been given my own T Shirt).  We didn't win, losing out to a group that was doing even more for their community, but it was cool to be nominated.  That's me as Batman in the picture below.


2023 was also a challenging year. After discovering drops of blood in my wee in August, I was fast-tracked through the GP and hospital process to see if I had cancer.  This opened up a long list of firsts: first CT scan (lie on a bed and have a magnetic polo mint go backwards and forwards around me), first cannula (I forgot to mention they injected me with iodine to make my insides show up better in the scan), and also a little camera.

This is where I was injected, on the inside of my elbow: the red rash is from the sticky dressing that was an effective exfoliant!

I'm fine. I have learned a lot about the human body - my human body in particular - and been given the thumbs up. I'm having an ultra sound in January to check on my kidney stones (the most likely cause of this entire episode).

This year was also the first time one of my children has gone overseas for a school trip - Germany, by coach, for four nights.  

There were other highlights too, like the first time I got to see the Red Arrows fly over my house, watching them from my front doorstep (I wasn't expecting them, otherwise I'd have a photo to prove it).

I constructed my first scale model diorama this year - I'd dabbled with models during my time as a student, but this year I actually built and finished a full scene. Not just one, but three (with others in progress).


This year also saw me learn to play "All Things Bright and Beautiful", which is a sneaky little tune. Most of us sang it at primary school, so we think it's easy to play, because it's easy to sing.  Ha!  It's a very complicated little ditty, with sharps and modulations all the way through it.  And it's written as a hymn with close four-part harmony, so you can't just whack out some chords and hope for the best.  I had to learn to play it from memory (I can't sight read fast enough to read and play).  Why? I played at my brother-in-law's wedding and this was the song he'd requested.

Having seen a wild deer while on holiday, this year also saw the first time I've chased a fox off the front garden, early in one summer morning.

Finally, in 2023, I carried out Careers Interviews with Year 10 and Year 11 students (aged 14-16) at my children's high school. For two days, I interviewed students to help them practise their interview technique, asking them the kinds of questions they'd be asked during interviews for jobs or college places.  It was fascinating to see the difference between the Year 10 and Year 11 students, and between those who had prepared and those who hadn't!  Those who hadn't probably thought they'd done well, but were not even close to those who had.

So, 2023 had some firsts that I would have preferred to have dodged, but which have given me useful experience for the future.  Happy New Year!

Tuesday 29 August 2023

Team-Building Exercises and Games for Remote Workers

 The pandemic and lockdown period of 2020-2021 generated a quantum leap in the way in which we work.  I mean - I was working remotely for nine years before the pandemic, but for the vast majority of my colleagues, there was a significant change in working patterns.  My team went completely remote - all working from home - and in September 2021 I became the team's manager.  Since then, we've had some team members leave; I've recruited some new team members, and had the challenging task of building a team.

My team comprises eight project managers plus me, spread across four countries and multiple timezones, so you can imagine that meeting in person isn't going to happen any time soon.  However, that did not stop us from building a strong positive team ethos, and getting to know each other better.  

How?  By playing games.  I mean 'completing exercises' ;-). Here's a list, and a few additional points at the end.

1.  Whose Desk Is This?

Each team member takes a photograph of their desk, and uploads the picture to a shared PowerPoint Deck.  On a team call, we all review the photographs one by one, and try to determine whose desk this is.  It's fun, it's different, and it works as the team tries to deduce who is most likely to have a tidy desk, and untidy desk, toys on their desk... pictures on their wall... the widescreen monitor and so on.  The hilarity follows when the team realise they were completely wrong!


2.  Whose Fridge Is This?
Same principle, but more personal than the fridge.  We discovered that one team member keeps a thermos flask in the fridge (it's her husband's, and she goes mad about it), and that fridges are a lot more expensive in Brazil than in Europe.  We also quizzed each other about our eating habits, and what that stuff is in the glass bowl on the top shelf.  Again, entertaining and a better ice-breaker than 'tell us about yourself'.

3.  Whose Car Is This? and its near neighbour, Whose Front Door Is This?
You get the idea.  For Whose Car Is This, we went further by actually finding stock photos of our cars instead of just taking photos (although this was included as usual).  We were getting so good at looking around the photos (for example, clues for items on top of the fridge, the general scene of the kitchen) that we stepped it up by having the web-source photos as well.  Front doors are front doors, there's no disguising them.

For an added bonus, when you get close to the holiday season, you can have Whose Christmas Tree Is This?  We did, and everybody took part - Christmas tree, seasonal decorations, whichever.  

4.  Online Pictionary (draw the object).  

For this, I prepared a list of seasonal objects - snowflake, Santa Claus, elf, snowman, wise man, shepherd, and so on.  The team split themselves into two groups, and each person shared their screen and used Paint to draw the word.  I am sure there are better online versions of this game, but there was something charming and hilarious about watching everyone try and draw using MS Paint that worked well.  It worked for us in the pre-Christmas period, but could work at any time of year with a given theme.

5.  Two Truths and a Lie

This has become our team favourite, and it requires no extra work from the manager.  You just tell everyone to prepare two truths and a lie, and to either update a shared deck or mail their statements to you.  They don't even have to tell you which is the lie, and then you can play along too.  And, if you take notes and listen to what everyone says, you'll have some great source material for the next activity...

6.  Online Quizzes

We use Kahoot, which is simple, straightforward and requires no costs if you have 10 or fewer team members.  As the manager, you become the question master and ask a series of multiple choice or true/false questions.  You'll need to write the questions in advance, and these can cover any (or multiple) topics.  Some questions are work-related - "What is your analysis of this data?" "What does this data show?" "What is the most important thing to remember about XYZ situation?" and some aren't - "Which member of the team said they most wanted to spend an afternoon with Agatha Christie?" "Which member of the team said that they wanted to do a parachute jump?"

The great thing is that most of these games can be used with teams of any size and discipline (engineers, sales, purchasing and HR can all play these games). What else works well in generating and building team spirit?

A.  Regular team meetings.
I can't emphasise this enough.  Even if you don't have a team-building exercise planned for the week, and everybody's rushed off their feet, and there isn't much to add to all the emails, keep the call anyway.  Your team will appreciate face-time with you, and a chance to catch up with you.  You might not have much to say, but they may have questions for you, so don't cancel!

B.  Regular one-to-ones alongside the team meetings
I meet with each member of my team once a fortnight (or once every two weeks, or bi-weekly).  This gives each member chance to ask you questions that are specific and relevant to them, on any topic that they want to discuss, and which may not be suitable in a team forum.  Cancel these at your own risk!

Friday 30 June 2023

Goals (and why I haven't posted recently)

I'll probably blog sometime soon about goals, objectives, strategies and measures.  They're important in business, and useful to have in life generally.  For now, though, I'll have to explain why I haven't blogged much recently at all:  I've found a new (old) hobby:  constructing Airfix models.  I started with Airfix models when I was about 10 or 11 years old - old enough to be patient to wait for the glue to dry, and careful enough to plan how to construct each model.  I had a second wave of interest in my late teens and early 20s, and more recently earlier this year (courtesy of my 11-year-old, now 12-year-old son).

So this is what's been filling my time - building with my son.

Here's my first solo-ish project for 25 years:


  

The set is the Airfix 25 pdr Field Gun with Quad - one that I bought and built during my time at university.  I enjoyed the set, mostly because of the various figures that come with the set.  I've not painted any of my sets in snow camouflage before - and you'll soon see that I'm not a stickler for historical accuracy: I paint what I like!

   

I identified this figure as a troop commander (his flat cap compares with the helmets that the rest of the troop are wearing).  The Quad truck has a gap in the roof, and I decided I was going to stand the commander in the vehicle, peering through the roof.  Yes, he's a very easy target standing there like that, but I figure - why not? 

  

I drew out the overall scene on a piece of wooden board, sketching the position of the vehicle, trailer and gun, and the key figures.  We also have some injured casualties in our collection, and they featured too.  The trees were obtained cheaply from Amazon, and they are cheap, low-quality and quite small for 1/72 scale.  I used scenic roll (green) and white spray paint (generic matt paint) to deliver the snow.  The spray paint was cheap and it didn't spray evenly, but that worked to my advantage to give patchy but heavy coverage.




 

 

  
The final diorama included a Metcalfe model pillbox (the square version), some additional bushes (not shown here) and a good complement of trees.  I added some crater marks (but no depth to the scene) to explain the casualties, and then added some medics too (they were trickier).

Next?  A village scene, with a pair of Tigers ploughing through the remains of a continental village (somewhere).  As ever, it's all about the modelling, and has very little to do with historical accuracy!























Sunday 30 April 2023

Personalization, Segmentation or Targeting

Following all my recent posts on targeting (or personalization), I was discussing website content changes with a colleague. I was explaining how we could test some form of interactive, real-time changes on our site.  His comments were that this wasn't real 1-to-1 personalization and what I was actually doing was just segmentation and content retargeting.  This started me thinking, and so I'd like to share my thoughts on 1-to-1 targeting is possible, easy and worth the effort.  Or should we be satisfied with segmentation and retargeting?


1-to-1 targeting requires the ability to show any content to any user. It probably needs a hige repository of content that can be accessed to show content that isn't shown to other users, but which is deemed optimal for a particular user.

1. How do you decide which type of user this particular user should be classed as?  
2.  How do you determine which content to show this particular user (or type of user)?
3.  When the targeting doesn't give great results, how can you tell if the problem is with 1. or 2.?

And, as a follow-up question, why is "targeted" content drawn from a library held in higher esteem than retargeting existing content? Is it better because it's so difficult to set up?

Content retargeting - moving existing content on the page - does not require new content, but "isn't real 1-to-1 targeting."  This is true, but I would argue that the difference - mathematically at least - is negligible.  The huge library of targeted content isn't going to be able to match the potential combinations of content that can be achieved just by flipping page content around to promote a particular group of products.

In previous examples, I've looked at having four product categories that can be targeted.

How many combinations are there for the four products A, B, C, D?

4 * 3 * 2 * 1 = 24

There are four options for the first placement, leaving three for the second placement, two options for the third and only one left for the final place.

This is a relatively simple example - most websites have more than just four products or product categories in their catalogue (even Apple, with its limited product range, has more than four).

Let's jump up to six products:
6 * 5 * 4 * 3 * 2 * 1 = 720.

At this point, retargeting is going to start scaling far more easily than 1-to-1 personalization. 

Admittedly, it's highly unlikely that all 720 combinations are going to be used and shown with equal probability - we will probably see maybe 6-10 combinations that are shown most often, as users visit just one or two product categories and identify themselves as menswear, casual clothes, or womenswear customers.  The remaining three or four categories aren't relevant to these customers, and so we don't retarget hat content.  I mean: if a user is visiting menswear and men's shoes, then they aren't going to be interested in womenswear and casual clothing, so the sequence of those categories is going to be irrelevant and unchanged.

So, we can group users into one of 720 "segments", not based on how we segment them, but how they segment themselves.  This leads to a pseudo-bespoke browsing experience (it isn't 1-to-1, but the numbers are high enough for it to be indistinguishable) that doesn't require the overhead of a huge library of product content waiting to be accessed.

When does the difference between true personalization and segmented retargeting become indistinguishable?  Are we chasing true 1-to-1 personalization when it isn't even beneficial to the customers' experience?

I would say that it's when the number of combinations of retargeted content becomes so large that users are seeing a targeted experience each time they come to the page.  Or, when the number of combinations is greater than the number of users who visit the page.  Personalization is usually perceived - and presented - as the holy grail of Web experience, but in my view it's unnecessary, unattainable and frequently unlikely to actually get off the drawing board. Why not try something that could give actual results, provide improved customer experience and could be set up this side of Christmas?






Tuesday 28 March 2023

Non-zero Traffic

Barely a month ago, I wrote about a challenge I was having tracking the traffic on this blog.  I had identified that I wasn't seeing any visitors on mobile phones.  Not one.

I'd taken some steps to fix this, and I had been able to track me using my own phone, but only from organic search.  I was still not tracking actual traffic.

Then I found an article that explains how to connect Blogger to Google Analytics 4 (and this has become justification for me to move to GA4).  

All I had to do was to enter my GA4 ID number into the Blogger setting for tracking... and that was it.  It took me weeks to track down the solution, but since then, I've been tracking mobile traffic:

At the time of writing, I've had the fix live for three weeks, and this is how it's looking compared to the three weeks prior to the fix.

I've carried out a number of checks to make sure the data is valid: 

are the mobile and desktop numbers different (or am I double-counting users?)?

are the desktop numbers even numbers (another indication of double counting)?

are the mobile numbers cannibalising the desktop numbers, or are they additional (and thankfully they look good).

In all cases, the data look good, showing clear and distinct differences between the mobile and desktop traffic, but I'm still glad I was able to validate the data accuracy. 

Tuesday 21 March 2023

Why Personalization Programs Struggle

So why aren’t we living in a world of perfect personalization? We've been hearing for a while that it'll be the next big thing, so why isn't it happening?

Because it’s hard.  There's just too much to consider, especially if you're after the ultimate goal of 1-to-1 personalization.


In my experience, there are three areas where personalization strategies come completely unstuck.  The first is in the data capture, the second is the classification and design of ‘personas’, and the third is in the visual design.

1. Data capture:  what data can you access?

Search keywords?
PPC campaign information?
Marketing campaign engagement?
Browsing history?
Purchase history?
Can you get geographic or demographic information?
Surely you can’t form a 1x1 relationship between each individual user and their experience? 
Previous purchaser?  And are you going to try and sell them another one of what they just bought?
Traffic source:  search/display/social?
What products are they looking at?
What have they added to basket?

2. Classification:  how are you going to decide how to aggregate and categorise all this data?  

Is it a new user?  Return user?

And the biggest crunch:  how are you going to then transfer these classifications to your Content Management System, or to your Targeting engine, so that it knows which category to place User #12345 into.  And that’s just where the fun begins.

And how do you choose the right data?  I'm personally becoming bored of seeing recommendations based on items I've bought:  "You bought this printer... how about this printer?" and "You recently purchased a new pair of shoes... would you like to buy a pair of shoes?" As an industry we seem to lack the sophistication that says, "You bought this printer - would you like to buy some ink for it?" or "You bought these shoes, would you like to buy this polish, or these laces?"

3. Visual Design

For each category or persona that you identify, you will need to have a corresponding version of your site.  For example, you’ll need to have a banner that promotes a particular product category (a holiday in France, the Caribbean, the Mediterranean, the USA); or you may need to have links to content about men’s shoes; women’s shoes; slippers or sports shoes. 

And your site merchandising team now needs to multiply its efforts for its campaigns. 

Previously, they needed one banner for the pre-Christmas campaign; now, they need to produce four, five or more instead.  This comes as they are approaching their busiest period (because that’s when you’ll get more traffic in and want to maximise its performance) and haven’t got time to generate duplicated content just for one banner.

Fortunately, there are ways of minimizing the headaches that you can encounter when you’re trying to get personalization up and running (or keeping it going).

Why not take the existing content, and show it to users in a different order?  Years ago, there was a mantra (with a meme, probably) going around that told us to 'Remember: There is no fold' but I've never subscribed to that view.  Analytics regularly shows us that most users don't scroll down to see our wonderful content lying just below the edge of their monitor (or their phone screen).  So, if you can identify a customer as someone looking for men's shoes, or women's sports shoes, or a 4x4, or a hatchback, or a plasma TV, then why not show that particular product category first (i.e. above the fold, or at least the first thing below it)?



4. Solutions

The flavour du jour in our house is Airfix modelling - building 1/72 or 1/48 scale vehicles and aircraft, so let's use that as an example, and visit  one of the largest online modelling stores in the UK, Wonderland Models.

Their homepage has a very large leading banner, which rotates like a carousel around five different images: a branding image; radio controlled cars; toy animals and figures; toys and playsets; and plastic model kits.  The opportunity here is to target users (either return visitors, which is easier, or new users, which is trickier) and show them the banner which is most relevant to them. 

The Wonderland Models homepage.  The black line is the fold on my desktop.

How do you select which banner?  By using the data that users are sharing with you - their previous visits, items they've browsed (or added to cart), or what they're looking for in your site search... and so on.  Here, the question of targeted content is simpler - show them the existing banner which closest matches their needs - but the data is trickier.  However, the banners and categories will help you determine the data categorization that you need to - you'll probably find this in your site architecture.

However, here's the bonus:  when you've classified (or segmented) your user, you can use this data again... lower down on the same page.  Most sites duplicate their links, or have multiple links around similar themes, and Wonderland Models is no exception. Here, the secondary categories are Radio Control; Models and Kits; Toys and Collectables; Paints, Tools and Materials; Model Railways and Sale.  These overlap with the banner categories, and with a bit of tweaking, the same data source could be used to drive targeting in both segments.  The win here is that with six categories (and a large part of the web page being targeted), there are thirty different combinations for just the first two slots, with six options for the first position, and five for the second.  This will be useful as the content is long and requires considerable scrolling.

The second and third folds on Wonderland Models.  The black lines show the folds.

Most analytics packages have an integration with CMS’s or targeting platforms.  Adobe Analytics has Target, which is its testing and targeting tool.  It's possible to connect the data from Analytics into Target (and I suspect your Adobe support team would be happy to help) and then use this to make an educated guess on which content to show to your visitors.  At the very least, you could run an A/B test.

5. The Challenge

The main reason personalization programs struggle to get going is (and I hate to use this expression, but here goes) that they aren't agile enough.  At a time when ecommerce is starting to use the product model and forming agile teams, it seems like personalization is often stuck in a waterfall approach.  There's no plan to form a minimum viable product, and try small steps - instead, it's wholesale all-in build the monolith, which takes months, then suffers a "funding reprioritization" since the program has nothing to show for its money so far...  this makes it even harder to gain traction (and funding) next time around.

6. The Start

So, don't be afraid to start small.  If you're resequencing the existing content on your home page, and you have three pieces of content, then there are six different ways that the content can be shown.  Without getting into the maths, there's ABC, ACB, BAC, BCA, CAB and CBA.  And you've already created six segments for six personas.  Or at least you've started, and that's what matters.  I've mentioned in a previous article about personalization and sequencing that if you can add in more content into your 'content bank' then the number of variations you can show increases exponentially.  So if you can show the value of resequencing what you already have, then you are in a stronger position to ask for additional content.  Engaging with an already-overloaded merchandising team is going to slow you down and frustrate them, so only work with them when you have something up-and-running to demonstrate.

Remember - start small, build up your MVP and only bring in stakeholders when you need to.  If you want to travel far, travel together, but if you want to travel quickly, travel light!