The Truth About Email Testing

Everyone and their brother/sister are now singing the praises of A/B and multivariate email testing.  But as a marketer I like to differentiate myself so I’ll play devil’s advocate.  This post challenges these email testing methods to see how accurate they really are.  The results are a little scary.

Background

After almost a decade in Demand Generation (and with the advent of technology that makes it much easier than in the past) I’ve finally gotten myself in the habit of testing EVERYTHING.  As you have probably already heard via the gazillion blog posts (slight exaggeration) that have been written on email testing, the results are often surprising and can lead to big improvements in your marketing.

Proving the value of different email testing methods usually focuses on comparing two options: testing versus not testing.  This makes sense but doesn’t necessarily prove the reliability of the test results or address any natural variations that can occur in the results.  So I thought I’d do a little experiment…

The Email Testing Experiment

On a recent email I decided to A/B test nothing.  That doesn’t mean I didn’t run a test – it just means that nothing was tested.  To be clearer, Group A was exactly identical to Group B; same email, same target audience, same sending date/time.  The only variations were the contacts in each list – which were randomly selected, so in theory, there should be no variation in results.  So my email testing was not focused on the variables themselves but instead on the natural variation that occurs between two random data sets.

I used a sample size of over 15,000 emails.  One would think a sample of this size is substantial enough that natural variations between A and B should essentially evaporate.  I’d like to explore this.

Measuring Email Testing

I track the following metrics, as they are the focus of most email testing:

Unique Open Rate (UOR): This is the number of unique contacts that opened the email.  Note that this is not the total number of opens, which can be skewed by contacts that open multiple times.  This metric can be used to determine the effectiveness of your subject line and sender name.

Unique Click Rate by Email Recipients (UCR): Of the contacts that received an email, this is the percentage of unique contacts that clicked through.  This is a helpful metric to determine the overall effectiveness of an email since it encompasses opens and clicks (an open must occur for a click to occur).  In general I don’t use this metric for email testing since it doesn’t offer assistance testing a single variable but I’ve included it since it does show the overall effectiveness.

Unique Click Rate by Email Opens (UCO): Of the contacts that actually opened an email, this is the percentage of unique contacts that clicked through.  This is a powerful metric because it tells you how effective your email content is at converting clicks and is not skewed by variables that impact email opens.

The Results

Now for the results…

Metrics Group A Group B
Unique Open Rate 14.3% 13.9%
Unique Click Rate/Recipients 1.5% 1.7%
Unique Click Rate/Opens 10.6% 12.4%

The Unique Open Rates showed a 0.4% delta, which I consider to be insignificant.  Unique Click Rate by Email Recipients showed an even smaller delta at 0.2%.  I believe these are completely acceptable natural variations and would not sway my opinion as to which email was more effective.

However, the Unique Click Rate by Emails Opened shows a 1.8% delta.  This delta of 1.8% is substantial.  Had I actually been testing two different content variables, this would have shown that one piece of content can boost my click through rate by almost 2%.  Multiply that by a couple thousand opens and you can see why someone would want to use content from Group B.  This can lead to the false perception that you are using better content following your test.

Another potential outcome is that Group A actually has more effective content but this boost is cancelled out in the results due to the false natural boost in Group B.  In either case, this natural variation can really throw a wrench in your email testing and decision-making.

Conclusion

In conclusion I have two pieces of advice:

  1. Before you draw any solid conclusions about these results I would advise you to test this on your own.  It is entirely possible that this particular experiment resulted in a variation that is greater than other attempts might show.  Only further testing will prove this either way.
  2. Don’t assume anything about anything and don’t be afraid to test everything – even your tests!  Marketers are great at developing creative campaigns.  Why not extend this creativity to your data.
  3. Keep email testing!  Even though this shows that you may get a false positive, chances are that in the long run your testing will still increase your effectiveness as a marketer even if a few tests fail along the way.