Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Thursday, August 07, 2008

Advice for Startup Marketing Advice - Learn Statistics

An interesting blog post, entitled Startup Marketing Advice came across my delicious network this past week. In this post, the author espouses the merits of the scientific method to determine catchy and effective marketing messaging in a cheap and effective way.

I have no problem with the thesis of the post. He's completely right that A/B testing is a terrific way to determine whether one message is better than another. Every elementary schooler knows that a good experiment has a control group and a test group, and the way to show that the variable under test is significant is to demonstrate that the test group responds significantly differently from the control.

The problem, though, is that the example given in the post doesn't prove anything! We'll look at the top two messages in his adwords result list ("Startup Marketing Advice" and "Marketing with Adwords") and show that there is no significant clickthrough rate difference.

The first problem that should stand out is that the number of clicks is so small that the proportions cannot be accurately estimated. The general rule of thumb for estimating a proportion is that you need at least 5 positive and 5 negative examples. The trial with the most clicks in his experiment only received 4 clicks. Therefore the CTR%s given by Google hardly mean anything - the standard error (calculated sqrt(p(1-p)/n) for a binomial proportion like CTR) can't be accurately determined, but is sure to be nearly as large as the CTR itself.

The second problem is the assessment of statistical significance between the different choices. The standard error of the difference between two estimators is sqrt(S_1^2/n_1 + S_2^2/n_2). Calculating this for the top two adwords campaigns, we see that the standard error for the difference between their clickthrough rates is somewhere in the vicinity of 0.35%. [Again, the statistics here are all inaccurate because we've already failed the "minimum of 5 clicks" rule of thumb, but we'll ignore that.] Because the measured CTR difference is 0.55-0.30 = 0.25, we calculate a Z score of (0.25 / 0.35) = 0.71. Looking this up in a table of the normal distribution we can find out that this is equivalent to about 75% significance that campaign 1 is better than campaign 2. This doesn't sound too terrible, but it's hardly scientific, especially when you consider the small sample sizes as mentioned earlier.

Note: I am not a statistician. I just play one on a blog.

Thursday, May 29, 2008

Confidence intervals for Jaccard Similarity?

Hoping that someone is googling for the right terms here:

Anyone out there know how to calculate a confidence interval around an estimate of the Jaccard similarity coefficient?

For Pearson correlation you can use Fisher's Z-prime Transformation, but I can't quite figure a principled way of doing the same for Jaccard similarity.