Experimentation

Your A/B test revenue metric was inflating 59x. The bug was one line.

4 min readBy David Arzumanian

A teammate's revenue A/B test showed a strange result: every variant, every day, exactly one purchaser with $0 revenue. Except one cell, control on day two, with 65 purchasers and roughly $9K. Everything else: one and zero.

The instinct is to blame the join. It's almost never the join.

What ifNull actually does

The bug was one line in the revenue CTE:

ifNull(rp.recur_total, O.amount) AS net_sales

The intent was reasonable: if a recurring payment row exists for this order, use its total; otherwise fall back to the order amount. The problem is what ifNull actually does. It only falls back when the first argument is NULL. Zero is not NULL. So any order with a recurring-payment record whose total had been zeroed out, by a refund, a cancellation, a partial reversal, silently overwrote a real O.amount with 0.

Those orders passed every upstream filter (status = 'completed', amount > 0, not a refund), joined to the experiment user, and landed in the purchase CTE with a real ID and total_net_sales = 0. Counted as a purchaser. Contributed nothing to revenue.

Now flip it

If a few recurring rows had inflated totals, a stacked subscription, a currency mix-up, the SUM carries the experiment. A single order silently overwritten to 59 times its real value looks exactly like a real lift. Same query, same variance, same z-score, opposite conclusion. You'd ship it.

That's the pattern with revenue metrics: the dangerous bugs don't break the query, they bias one variant. They pass every sanity check that looks at aggregates. They only show up when you stare at the per-user distribution.

Two habits that catch this before the readout

  1. For any revenue metric, plot the per-user distribution per variant before computing the test statistic. The zero-inflated mass, the tail, and the count of suspiciously round values tell you more than the mean ever will.
  2. Treat every COALESCE / ifNull / NULLIF in a revenue pipeline as a place a silent override can happen. Write the fallback against the actual failure mode, ifNull(NULLIF(rp.recur_total, 0), O.amount), not the one you imagined.

If you run revenue experiments and have never audited the numerator end to end, start with the test you're most confident in. That's usually where the most expensive surprise is hiding.

Has your revenue numerator ever been audited end to end?

The free 90-second diagnostic flags where your pipeline is most likely lying. Or book a call and we'll look at one real test together.

← All writing