Experimentation

When churn and charges both rise in the same A/B test

4 min readBy David Arzumanian

Most A/B test reads stop at "did the metric move." That's where the interesting question usually starts.

A recent example. A retention feature lifted a proxy metric, engagement with a queue action, +3.4%, p<0.001. The kind of result that gets a Slack celebration.

Then we looked one layer down. The same variant lifted unsubscribes by about 2% (significant) and nudged charges up by about 1% (not significant). Both ends of the funnel got fatter at once.

Action vs. outcome

This is the diagnostic I now run before approving any "winner": does the proxy metric measure an action, or an outcome?

Actions are things users do: click, add, engage. Outcomes are things the business gets: charges, retention, gross profit. When a feature lifts an action metric, it has activated previously-passive users. That's it. It hasn't told you what they do next.

Activated users go somewhere. Some convert. Some churn. If your population sits on both sides of a fence, poking them harder pushes some over each side. The action moves cleanly. The outcomes split. That's polarization, and it is invisible if you only read the net.

Three rules I now write into every test doc before launch

  1. Name the action metric and the outcome metric separately. If your primary KPI is an action, you don't have a primary KPI, you have a leading indicator. Pick the downstream economic metric and power the test for it, even if it takes longer.
  2. Read both tails of the funnel. Polarization (churn up and charge up) is invisible if you only look at the net. Decompose by user state going in, passive skippers, active subscribers, payment-failed, and check whether the lift on one segment is paid for by losses on another.
  3. Treat "engagement up, economics flat" as a failure mode, not a partial win. If gross profit per visitor didn't move, the mechanic didn't earn its place. The concept may have legs, but the implementation owes you a second iteration, not a ship decision.

The expensive mistake isn't shipping a bad test. It's shipping a test where the action moved, calling it a win, and finding the polarization in a quarterly retention review three months later.

Action metrics tell you the feature is alive. Outcome metrics tell you whether it's earning.

Does your metric tree separate actions from outcomes?

Take the free 90-second diagnostic, or book a call and I'll give your metric tree a second pair of eyes before launch.

← All writing