Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-e...

Full description

Bibliographic Details
Main Authors: Gordon, Brett R., Moakler, Robert, Zettelmeyer, Florian
Format: Text
Language:unknown
Published: 2022
Subjects:
DML
Online Access:http://arxiv.org/abs/2201.07055
id ftarxivpreprints:oai:arXiv.org:2201.07055
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2201.07055 2023-09-05T13:19:05+02:00 Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian 2022-01-18 http://arxiv.org/abs/2201.07055 unknown http://arxiv.org/abs/2201.07055 Economics - Econometrics text 2022 ftarxivpreprints 2023-08-16T16:53:14Z Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect. Text DML ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Economics - Econometrics
spellingShingle Economics - Econometrics
Gordon, Brett R.
Moakler, Robert
Zettelmeyer, Florian
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
topic_facet Economics - Econometrics
description Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect.
format Text
author Gordon, Brett R.
Moakler, Robert
Zettelmeyer, Florian
author_facet Gordon, Brett R.
Moakler, Robert
Zettelmeyer, Florian
author_sort Gordon, Brett R.
title Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_short Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_full Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_fullStr Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_full_unstemmed Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_sort close enough? a large-scale exploration of non-experimental approaches to advertising measurement
publishDate 2022
url http://arxiv.org/abs/2201.07055
genre DML
genre_facet DML
op_relation http://arxiv.org/abs/2201.07055
_version_ 1776199901219127296