Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-e...

Full description

Bibliographic Details
Main Authors:	Gordon, Brett R., Moakler, Robert, Zettelmeyer, Florian
Format:	Text
Language:	unknown
Published:	2022
Subjects:	Economics - Econometrics DML
Online Access:	http://arxiv.org/abs/2201.07055

_version_	1821499127757799424
author	Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian
author_facet	Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian
author_sort	Gordon, Brett R.
collection	ArXiv.org (Cornell University Library)
description	Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect.
format	Text
genre	DML
genre_facet	DML
id	ftarxivpreprints:oai:arXiv.org:2201.07055
institution	Open Polar
language	unknown
op_collection_id	ftarxivpreprints
op_relation	http://arxiv.org/abs/2201.07055
publishDate	2022
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2201.07055 2025-01-16T21:38:29+00:00 Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian 2022-01-18 http://arxiv.org/abs/2201.07055 unknown http://arxiv.org/abs/2201.07055 Economics - Econometrics text 2022 ftarxivpreprints 2023-08-16T16:53:14Z Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect. Text DML ArXiv.org (Cornell University Library)
spellingShingle	Economics - Econometrics Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title	Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_full	Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_fullStr	Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_full_unstemmed	Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_short	Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
title_sort	close enough? a large-scale exploration of non-experimental approaches to advertising measurement
topic	Economics - Econometrics
topic_facet	Economics - Econometrics
url	http://arxiv.org/abs/2201.07055

Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

Similar Items