I have no problem with the thesis of the post. He's completely right that A/B testing is a terrific way to determine whether one message is better than another. Every elementary schooler knows that a good experiment has a control group and a test group, and the way to show that the variable under test is significant is to demonstrate that the test group responds significantly differently from the control.
The problem, though, is that the example given in the post doesn't prove anything! We'll look at the top two messages in his adwords result list ("Startup Marketing Advice" and "Marketing with Adwords") and show that there is no significant clickthrough rate difference.
The first problem that should stand out is that the number of clicks is so small that the proportions cannot be accurately estimated. The general rule of thumb for estimating a proportion is that you need at least 5 positive and 5 negative examples. The trial with the most clicks in his experiment only received 4 clicks. Therefore the CTR%s given by Google hardly mean anything - the standard error (calculated sqrt(p(1-p)/n) for a binomial proportion like CTR) can't be accurately determined, but is sure to be nearly as large as the CTR itself.
The second problem is the assessment of statistical significance between the different choices. The standard error of the difference between two estimators is sqrt(S_1^2/n_1 + S_2^2/n_2). Calculating this for the top two adwords campaigns, we see that the standard error for the difference between their clickthrough rates is somewhere in the vicinity of 0.35%. [Again, the statistics here are all inaccurate because we've already failed the "minimum of 5 clicks" rule of thumb, but we'll ignore that.] Because the measured CTR difference is 0.55-0.30 = 0.25, we calculate a Z score of (0.25 / 0.35) = 0.71. Looking this up in a table of the normal distribution we can find out that this is equivalent to about 75% significance that campaign 1 is better than campaign 2. This doesn't sound too terrible, but it's hardly scientific, especially when you consider the small sample sizes as mentioned earlier.
No comments:
Post a Comment