Regularizing double machine learning in partially linear endogenous models

The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a re...

Full description

Bibliographic Details
Main Authors: Emmenegger, Corinne, id_orcid:0 000-0003-0353-8888, Bühlmann, Peter
Format: Article in Journal/Newspaper
Language:English
Published: Cornell University 2021
Subjects:
DML
Online Access:https://hdl.handle.net/20.500.11850/525119
https://doi.org/10.3929/ethz-b-000525119
id ftethz:oai:www.research-collection.ethz.ch:20.500.11850/525119
record_format openpolar
spelling ftethz:oai:www.research-collection.ethz.ch:20.500.11850/525119 2023-08-15T12:41:04+02:00 Regularizing double machine learning in partially linear endogenous models Emmenegger, Corinne id_orcid:0 000-0003-0353-8888 Bühlmann, Peter 2021 application/application/pdf https://hdl.handle.net/20.500.11850/525119 https://doi.org/10.3929/ethz-b-000525119 en eng Cornell University info:eu-repo/semantics/altIdentifier/doi/10.1214/21-ejs1931 info:eu-repo/semantics/altIdentifier/wos/000740666000062 info:eu-repo/grantAgreement/EC/H2020/786461 http://hdl.handle.net/20.500.11850/525119 doi:10.3929/ethz-b-000525119 info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International Electronic Journal of Statistics, 15 (2) Double machine learning Endogenous variables Generalized method of moments Instrumental variables K-class estimation Partially linear model Regularization Semiparametric estimation Two-stage least squares info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion 2021 ftethz https://doi.org/20.500.11850/52511910.3929/ethz-b-00052511910.1214/21-ejs1931 2023-07-23T23:47:07Z The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg. ISSN:1935-7524 Article in Journal/Newspaper DML ETH Zürich Research Collection
institution Open Polar
collection ETH Zürich Research Collection
op_collection_id ftethz
language English
topic Double machine learning
Endogenous variables
Generalized method of moments
Instrumental variables
K-class estimation
Partially linear model
Regularization
Semiparametric estimation
Two-stage least squares
spellingShingle Double machine learning
Endogenous variables
Generalized method of moments
Instrumental variables
K-class estimation
Partially linear model
Regularization
Semiparametric estimation
Two-stage least squares
Emmenegger, Corinne
id_orcid:0 000-0003-0353-8888
Bühlmann, Peter
Regularizing double machine learning in partially linear endogenous models
topic_facet Double machine learning
Endogenous variables
Generalized method of moments
Instrumental variables
K-class estimation
Partially linear model
Regularization
Semiparametric estimation
Two-stage least squares
description The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg. ISSN:1935-7524
format Article in Journal/Newspaper
author Emmenegger, Corinne
id_orcid:0 000-0003-0353-8888
Bühlmann, Peter
author_facet Emmenegger, Corinne
id_orcid:0 000-0003-0353-8888
Bühlmann, Peter
author_sort Emmenegger, Corinne
title Regularizing double machine learning in partially linear endogenous models
title_short Regularizing double machine learning in partially linear endogenous models
title_full Regularizing double machine learning in partially linear endogenous models
title_fullStr Regularizing double machine learning in partially linear endogenous models
title_full_unstemmed Regularizing double machine learning in partially linear endogenous models
title_sort regularizing double machine learning in partially linear endogenous models
publisher Cornell University
publishDate 2021
url https://hdl.handle.net/20.500.11850/525119
https://doi.org/10.3929/ethz-b-000525119
genre DML
genre_facet DML
op_source Electronic Journal of Statistics, 15 (2)
op_relation info:eu-repo/semantics/altIdentifier/doi/10.1214/21-ejs1931
info:eu-repo/semantics/altIdentifier/wos/000740666000062
info:eu-repo/grantAgreement/EC/H2020/786461
http://hdl.handle.net/20.500.11850/525119
doi:10.3929/ethz-b-000525119
op_rights info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International
op_doi https://doi.org/20.500.11850/52511910.3929/ethz-b-00052511910.1214/21-ejs1931
_version_ 1774294150800211968