Don't ship a winner that was just luck
The most common A/B testing mistake is calling a result early. A variant jumps 18% on day two, everyone cheers, the test ends, and the “win” evaporates in production. With small samples, conversion rates swing wildly; significance testing exists to tell the signal from that noise.
This calculator answers one precise question: given the numbers you collected, how confident can you be that the variant genuinely differs from the control? It does not tell you whether you've run the test long enough, that's a sample-size question you should settle before launch. The two together keep you honest.
Reading the result
- p < 0.05 (95%+): significant. The difference is unlikely to be chance, act on it.
- 85–95%: trending. Promising but not proven; gather more data.
- Below 85%: noise. Don't read anything into it yet.