Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale ...

Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a...

Full description

Bibliographic Details
Main Authors:	Hünermund, Paul, Louw, Beyers, Caspi, Itamar
Format:	Text
Language:	unknown
Published:	arXiv 2021
Subjects:	Econometrics econ.EM FOS Economics and business DML
Online Access:	https://dx.doi.org/10.48550/arxiv.2108.11294 https://arxiv.org/abs/2108.11294

Description
Summary:	Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This paper demonstrates that DML is very sensitive to the inclusion of only a few "bad controls" in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way. ... : v4: published version ...

Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale ...

Similar Items