Choosing Size of Control Groups
Drilling Down Newsletter #107 12/2009
Drilling Down - Turning Customer
Data into Profits with a Spreadsheet
Customer Valuation, Retention, Loyalty, Defection
Get the Drilling Down Book!
Many folks use a calculator or "rule" for choosing the
size of control groups. Often they opt for the minimum size that
results in a statistically significant test at some level of
confidence the team agrees to.
While the stats logic makes sense, there are a number of other
"soft" factors that should also be considered when making
these choices. We'll cover hidden costs of running sample sizes
too "tight" in this month's newsletter.
Let's go directly to the Drillin'...
Questions from Fellow Drillers
Choosing Size of Control Groups
Q: I am a big fan of your web site and read your Drilling Down
book. Great work on these efforts!
A: Thanks for the kind words!
Q: I was wondering if you could help me picking the right control group size for a project of ours?
The population is 25 million telco customers that for which we want to do a long term impact analysis (month by month) in regards to revenue increase
versus control group. The marketing initiatives are mix of retention, lifecycle and tactical/seasonal activities.
We want to measure revenue increase through any of the marketing activities compared to control group.
A: Great project, this is the kind of idea that can really improve margins if you can find out which specific tactics drop the most profit to the bottom
Q: I have searched the web for some help and found calculators that say: On 25 million and smallest expected uplift of 0.1% and highest likely rate of > 5% the calculator gives 250k (1%).
Is that sufficient to calculate the net impact on the remaining base?
Would be very grateful if you could give me your thoughts.
A: Well, it could be and might not be...
If you were manufacturing widgets, producing drugs, etc, where the outcomes are clear (unit is defective or not defective), you might use this approach to
But in Marketing we're talking about human behavior, and there is
quite a lot more variability in outcomes and more room for
interpretation. You can encounter a number of problems down the road by running a control so "tight" to the statistically correct size.
From a practical perspective, when you do a test of this magnitude
(and I assume strategic importance), you don't want test to just
"beat control", you want to beat control beyond a shadow of
any executive's possible doubt.
From personal experience, I can tell you that executives tend to be non-believers with a 1% control versus a 5% control or a 10% control. So some of this control size choice is culture-based - if the exec team is a bunch of engineers that understand / believe in statistical sampling methods, then 1% is probably OK in terms of believing the results are predictive of future events.
But if you need to convince a CFO or somebody who will be working from gut or risk management rather than "science" then 1% may not be enough, there is too much perceived "room for error" with a 1% sample (even with the science).
This is in effect a "perceived confidence interval" argument - the difference between 95% confidence and 99.999% confidence. Engineers may be OK with 95% because they intimately understand the derivation of it; CFO's not so much.
CFO's may understand the math behind confidence but intuitively, they perceive that 10% control is "more likely to be accurate" than 1%.
Said another way, do you want people to argue about the math and stats and waver on their belief in the outcome, or do you want them to just look at a
simple chart of test versus control numbers and say, "Congratulations, that's a tremendous success!".
A 10% control gets you complete agreement on the results without any quibbling.
At 1%, you may get "what about the chance we are wrong" arguments.
Now, there are financial implications to using very large controls - some positive (reduced expense), and some negative (potential revenue foregone).
So choosing control group size can be impacted by these other issues.
In small population tests these financial impacts are usually quite small, so negligible and I always go for large controls.
But in a population of 25 million, maybe not so. Which brings us to the second consideration - segmentation or "drill
down" after the test.
Nothing is quite so painful as gearing up for a test of this magnitude, producing a stunning positive result on a "macro" basis across all initiatives, and then having the execs ask, "What is the driving force behind this increased profitability in the test
group? Is it retention, lifecycle or tactical / seasonal?"
Or as often happens in telco (usually from an ops GM or VP), "What was the result
of this test in my region or on my platform?"
With a 1% control across the entire population, you frequently are "boxed in" when it comes to sub-populations because you lose significance (both perceived and scientific) as you drill in.
You may be OK on a couple of large scale events on large populations, but as we know, every answer begs another question and you can run out of statistically significant answers pretty quickly.
If you use a large control at the macro level, you are (as a rough example)
99% confident at the macro level, 98% confident one segment down, 97% confident two segments down, 95% confident three segments down, etc.
One way to handle this is to build the test from subsegments up to the macro level.
Let's say at a minimum you want 3 subsegments in the test - retention, lifecycle or tactical / seasonal - and each of these you want to be 95% confident in.
Since some of these programs are triggered by behavior (lifecycle) and some by calendar (seasonal) I'd guess the sizes of the populations and number of executions could be vastly different.
Meaning, you may only need 1% control on the seasonal promotions but more like 5% or 10% control for some of the lifecycle stuff to be 95% confident on
the outcomes of those.
When you sum all these segments up, you often end up with more like 2% or 3% of the entire population in control groups to always be at least 95% confident at all the desired subsegments, which means you end up with even higher confidence at the macro "all campaigns" level - a very good thing.
And much better than trying to explain why you can't answer a subsegment question because you used 250K instead of 400K or 600K in the control group, if you know what I mean!
That's when people forget the arguments about foregone revenue and
start saying stuff like "How could you even think about not using
a larger control group?"
In the end, you will thank yourself again and again for using a larger than minimum required control at the macro level because you WILL come up with that unexpected "must know" question and be thrilled to find out you actually can answer it at a decent level of confidence.
Good luck with it, let me know what you learn!
Have a question on Customer Valuation, Retention, Loyalty, or Defection?
Go ahead and send it to me here.
If you are a consultant, agency, or software developer with clients
needing action-oriented customer intelligence or High ROI Customer
Marketing program designs, click
That's it for this month's edition of the Drilling Down newsletter.
If you like the newsletter, please forward it to a friend! Subscription instructions are top and bottom of this page.
Any comments on the newsletter (it's too long, too short, topic
suggestions, etc.) please send them right along to me, along with any
other questions on customer Valuation, Retention, Loyalty, and
'Til next time, keep Drilling Down!
- Jim Novo
Copyright 2009, The Drilling Down Project by Jim Novo. All
rights reserved. You are free to use material from this
newsletter in whole or in part as long as you include complete
credits, including live web site link and e-mail link. Please
tell me where the material will appear.
What would you like to
the book with Free customer scoring software at:
Out Specifically What is in the Book
Marketing Models and Metrics (site article