EP3: The Causality Gap — Measuring the True Impact of Voluntary Adoption in Digital Marketplaces

Randomized Encouragement Design + DoubleML for voluntary adoption

observational causal inference

randomized encouragement design

instrumental variables

DoubleML

LATE

Author

Lin Jia

Published

May 22, 2026

A feature can create value and still fail the experiment.

In marketplaces, many features depend on users or partners choosing to opt in — loyalty programs, vouchers and incentives, partner optimization tools, email messaging. That creates a surprisingly tricky measurement problem:

Only a small fraction of eligible users adopt.
The people who adopt love it.
Yet the A/B test reads flat.

We call this the causality gap.

The wrong question: “should we kill it?”

Faced with a flat A/B result, the default instinct is to kill the feature. But that conclusion quietly assumes the feature and its adoption are the same thing. They are not.

A weak experiment result can mean two very different things:

The feature creates little value, or
The feature creates real value, but very few users adopted it.

Those two stories call for completely different actions:

🎯 Improve the product — adopters aren’t getting enough value.
📣 Improve adoption — adopters love it, you just need more of them.
🧭 Stop investing — the value isn’t there, even for adopters.

Reading a flat A/B as “the feature doesn’t work” collapses three different decisions into one. That is the trap.

Two questions, two distinct estimands

In the article, Kexin Fei and I walk through how Randomized Encouragement Designs (RED) combined with DoubleML can answer the two questions a marketplace team actually cares about:

What is the effect on users who actually adopt the feature? (the effect among adopters — LATE / CACE)
What is the impact of rolling it out to everyone eligible? (the intent-to-treat effect — ITT)

These are different numbers with different uses, and most teams unintentionally mix them up. RED + DoubleML lets you estimate both cleanly from the same experiment.

Why the naïve fixes don’t save you

The instinctive next move is to compare adopters to “similar” non-adopters by adjusting for what you can measure — past activity, engagement, demographics. That is the spirit behind regression adjustment, matching, weighting, and their machine-learning variants.

Here is the catch. Adopters and non-adopters differ in things you can measure and in things you cannot — how motivated they were to begin with, how much they would have engaged anyway. Adjustment only handles the measurable differences. The invisible ones — exactly the ones that drove the adoption decision in the first place — stay invisible.

It gets worse. When you match adopters to non-adopters on observable proxies for motivation, you systematically pick the most motivated non-adopters as your control group. You end up comparing a mixed-motivation treatment group against a hand-picked super-motivated control. Broken by design.

Adding more controls does not fix this. Adjusting harder on the wrong variables just produces a wrong answer with tighter confidence intervals.

In the article, we walk through the mechanics step by step — and show what randomized encouragement gives you that pure adjustment never can.

Read the article — or listen to the episode

👉 The Causality Gap: Measuring the True Impact of Voluntary Adoption in Digital Marketplaces →

If it lands for you, a few claps on Medium really do help it reach more practitioners working on the same problem.

Prefer audio? The same ideas on the podcast:

Copyright

© 2026 Lin Jia. All rights reserved. This content is created independently on personal time and is based exclusively on public domain academic knowledge. No proprietary, confidential, or employer-owned materials, data, or intellectual property are included. The views and opinions expressed in this work are strictly my own and do not reflect those of any current or former employer. All methodology discussed is sourced from publicly available scientific literature.