EP3: The Causality Gap — Measuring the True Impact of Voluntary Adoption in Digital Marketplaces
Randomized Encouragement Design + DoubleML for voluntary adoption
A feature can create value and still fail the experiment.
In marketplaces, many features depend on users or partners choosing to opt in — loyalty programs, vouchers and incentives, partner optimization tools, email messaging. That creates a surprisingly tricky measurement problem:
- Only a small fraction of eligible users adopt.
- The people who adopt love it.
- Yet the A/B test reads flat.
We call this the causality gap.
The wrong question: “should we kill it?”
Faced with a flat A/B result, the default instinct is to kill the feature. But that conclusion quietly assumes the feature and its adoption are the same thing. They are not.
A weak experiment result can mean two very different things:
- The feature creates little value, or
- The feature creates real value, but very few users adopted it.
Those two stories call for completely different actions:
- 🎯 Improve the product — adopters aren’t getting enough value.
- 📣 Improve adoption — adopters love it, you just need more of them.
- 🧭 Stop investing — the value isn’t there, even for adopters.
Reading a flat A/B as “the feature doesn’t work” collapses three different decisions into one. That is the trap.
Two questions, two distinct estimands
In the article, Kexin Fei and I walk through how Randomized Encouragement Designs (RED) combined with DoubleML can answer the two questions a marketplace team actually cares about:
- What is the effect on users who actually adopt the feature? (the effect among adopters — LATE / CACE)
- What is the impact of rolling it out to everyone eligible? (the intent-to-treat effect — ITT)
These are different numbers with different uses, and most teams unintentionally mix them up. RED + DoubleML lets you estimate both cleanly from the same experiment.
Why the naïve fixes don’t save you
The instinctive next move is to compare adopters to “similar” non-adopters by adjusting for what you can measure — past activity, engagement, demographics. That is the spirit behind regression adjustment, matching, weighting, and their machine-learning variants.
Here is the catch. Adopters and non-adopters differ in things you can measure and in things you cannot — how motivated they were to begin with, how much they would have engaged anyway. Adjustment only handles the measurable differences. The invisible ones — exactly the ones that drove the adoption decision in the first place — stay invisible.
It gets worse. When you match adopters to non-adopters on observable proxies for motivation, you systematically pick the most motivated non-adopters as your control group. You end up comparing a mixed-motivation treatment group against a hand-picked super-motivated control. Broken by design.
Adding more controls does not fix this. Adjusting harder on the wrong variables just produces a wrong answer with tighter confidence intervals.
In the article, we walk through the mechanics step by step — and show what randomized encouragement gives you that pure adjustment never can.
Read the article — or listen to the episode
👉 The Causality Gap: Measuring the True Impact of Voluntary Adoption in Digital Marketplaces →
If it lands for you, a few claps on Medium really do help it reach more practitioners working on the same problem.
Prefer audio? The same ideas on the podcast: