Writing

Notes on making experiments trustworthy

Field notes from the unglamorous half of experimentation: where wins turn out to be noise, metrics quietly disagree, and rigor is the difference between shipping on signal and shipping on luck.

Experimentation

Why your A/B test winner didn't hold up

You shipped a winning variant and the lift vanished. The four reasons winners fade after launch — peeking, regression to the mean, novelty effects, and multiple comparisons — and how to stop shipping on noise.

Read the essay →

Experimentation

When your sequential test says "keep running" at Z = 3.997

A +36% lift and a 3.997 Z-score, and the boundary still says keep running. Why that's the O'Brien-Fleming design working as paid for, and the discipline of not flinching at an interim peek.

Read the essay →

Experimentation

Your A/B test revenue metric was inflating 59x. The bug was one line.

Every variant showed one purchaser and zero revenue, except one cell. The culprit was a single ifNull in the revenue CTE, and it survived every aggregate sanity check.

Read the essay →

Experimentation

When churn and charges both rise in the same A/B test

A retention feature lifted engagement and both ends of the funnel at once. Why polarization hides in the net, and the action-vs-outcome diagnostic that catches it.

Read the essay →

More essays coming, on guardrail metrics, attribution drift across channels, and measuring launches you can't A/B test.