Double machine learning and automated confounder selection: A cautionary tale
Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a...
Published in: | Journal of Causal Inference |
---|---|
Main Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
De Gruyter
2023
|
Subjects: | |
Online Access: | https://doi.org/10.1515/jci-2022-0078 https://doaj.org/article/6ee6a807d4aa4e5581151710701dcbd0 |
Summary: | Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way. |
---|