Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-e...
Main Authors: | , , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2022
|
Subjects: | |
Online Access: | http://arxiv.org/abs/2201.07055 |
id |
ftarxivpreprints:oai:arXiv.org:2201.07055 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:2201.07055 2023-09-05T13:19:05+02:00 Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian 2022-01-18 http://arxiv.org/abs/2201.07055 unknown http://arxiv.org/abs/2201.07055 Economics - Econometrics text 2022 ftarxivpreprints 2023-08-16T16:53:14Z Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect. Text DML ArXiv.org (Cornell University Library) |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Economics - Econometrics |
spellingShingle |
Economics - Econometrics Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
topic_facet |
Economics - Econometrics |
description |
Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect. |
format |
Text |
author |
Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian |
author_facet |
Gordon, Brett R. Moakler, Robert Zettelmeyer, Florian |
author_sort |
Gordon, Brett R. |
title |
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
title_short |
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
title_full |
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
title_fullStr |
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
title_full_unstemmed |
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement |
title_sort |
close enough? a large-scale exploration of non-experimental approaches to advertising measurement |
publishDate |
2022 |
url |
http://arxiv.org/abs/2201.07055 |
genre |
DML |
genre_facet |
DML |
op_relation |
http://arxiv.org/abs/2201.07055 |
_version_ |
1776199901219127296 |