Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-e...

Full description

Bibliographic Details
Main Authors: Gordon, Brett R., Moakler, Robert, Zettelmeyer, Florian
Format: Text
Language:unknown
Published: 2022
Subjects:
DML
Online Access:http://arxiv.org/abs/2201.07055
Description
Summary:Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect.