Don't ship a winner that was just luck
The most common A/B testing mistake is calling a result early. A variant jumps 18% on day two, everyone cheers, the test ends — and the “win” evaporates in production. With small samples, conversion rates swing wildly; significance testing exists to tell the signal from that noise.
This calculator answers one precise question: given the numbers you collected, how confident can you be that the variant genuinely differs from the control? It does not tell you whether you've run the test long enough — that's a sample-size question you should settle before launch. The two together keep you honest.
Reading the result
- p < 0.05 (95%+): significant. The difference is unlikely to be chance — act on it.
- 85–95%: trending. Promising but not proven; gather more data.
- Below 85%: noise. Don't read anything into it yet.