Free toolPostWyse

A/B Test Significance Calculator

Enter visitors and conversions for your control and variant. Get the relative uplift, statistical significance, p-value, confidence level, and a plain-English verdict on whether the result is real or just noise. No signup.

Control (A)

Visitors

Conversions

Conversion rate

5.00%

Variant (B)

Visitors

Conversions

Conversion rate

6.21%

Result

Significant, variant wins

At 99.0% confidence (p = 0.010), this result is unlikely to be chance. You can act on it.

Relative uplift

+24.2%

Confidence

99.0%

P-value

0.010

Z-score

2.57

Two-tailed pooled z-test for two proportions at a 95% significance threshold (p < 0.05). This measures whether the difference is real, not whether your sample is large enough, run a sample-size check before you start, and avoid peeking at the result and stopping the moment it crosses the line.

Don't ship a winner that was just luck

The most common A/B testing mistake is calling a result early. A variant jumps 18% on day two, everyone cheers, the test ends, and the “win” evaporates in production. With small samples, conversion rates swing wildly; significance testing exists to tell the signal from that noise.

This calculator answers one precise question: given the numbers you collected, how confident can you be that the variant genuinely differs from the control? It does not tell you whether you've run the test long enough, that's a sample-size question you should settle before launch. The two together keep you honest.

Reading the result

p < 0.05 (95%+): significant. The difference is unlikely to be chance, act on it.
85–95%: trending. Promising but not proven; gather more data.
Below 85%: noise. Don't read anything into it yet.

Want this on autopilot

PostWyse drafts the variants, runs the test, and only surfaces a winner once it clears significance, so you act on real lifts, not lucky days.

Try PostWyse

Frequently asked

What is statistical significance in an A/B test?

Statistical significance is the probability that the difference between your control and variant is real rather than random chance. The convention is 95% confidence (a p-value below 0.05), meaning there's less than a 5% chance you'd see a difference this large if the two versions actually performed identically.

What is a p-value?

The p-value is the probability of observing a difference at least as extreme as yours if there were truly no difference between the variants. A p-value of 0.03 means a 3% chance the result is noise. Below 0.05 is the standard bar for declaring a winner.

How does this calculator work?

It runs a two-tailed pooled z-test for two proportions, the standard test for comparing conversion rates. It pools the two groups to estimate the shared conversion rate, computes the standard error, derives a z-score, and converts that to a p-value and confidence level.

Why shouldn't I stop a test as soon as it hits significance?

Because of 'peeking.' If you check repeatedly and stop the instant p dips below 0.05, you dramatically inflate your false-positive rate, random fluctuations will cross the line temporarily. Decide your sample size up front, run to it, then read the result once. This calculator tells you if a result is significant, not whether you've collected enough data.

How much traffic do I need for an A/B test?

It depends on your baseline conversion rate and the smallest uplift worth detecting, smaller effects need much larger samples. As a rough rule, low single-digit conversion rates often need thousands of visitors per variant to detect a 10–20% relative lift. Run a sample-size calculation before you launch.