Double/Debiased Machine Learning for Treatment and Structural Parameters

We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting by allowing for η_0 to be so high-dimensional that the traditional assumptions, such as Donsker properties,...

Full description

Bibliographic Details
Main Authors: Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins
Format: Report
Language:unknown
Subjects:
DML
Online Access:http://www.nber.org/papers/w23564.pdf
id ftrepec:oai:RePEc:nbr:nberwo:23564
record_format openpolar
spelling ftrepec:oai:RePEc:nbr:nberwo:23564 2024-04-14T08:10:55+00:00 Double/Debiased Machine Learning for Treatment and Structural Parameters Victor Chernozhukov Denis Chetverikov Mert Demirer Esther Duflo Christian Hansen Whitney Newey James Robins http://www.nber.org/papers/w23564.pdf unknown http://www.nber.org/papers/w23564.pdf preprint ftrepec 2024-03-19T10:40:00Z We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting by allowing for η_0 to be so high-dimensional that the traditional assumptions, such as Donsker properties, that limit complexity of the parameter space for this object break down. To estimate η_0, we consider the use of statistical or machine learning (ML) methods which are particularly well-suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η_0 cause a heavy bias in estimators of θ_0 that are obtained by naively plugging ML estimators of η_0 into estimating equations for θ_0. This bias results in the naive estimator failing to be N^(-1/2) consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ_0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ_0, and (2) making use of cross-fitting which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general ... Report DML RePEc (Research Papers in Economics)
institution Open Polar
collection RePEc (Research Papers in Economics)
op_collection_id ftrepec
language unknown
description We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting by allowing for η_0 to be so high-dimensional that the traditional assumptions, such as Donsker properties, that limit complexity of the parameter space for this object break down. To estimate η_0, we consider the use of statistical or machine learning (ML) methods which are particularly well-suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η_0 cause a heavy bias in estimators of θ_0 that are obtained by naively plugging ML estimators of η_0 into estimating equations for θ_0. This bias results in the naive estimator failing to be N^(-1/2) consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ_0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ_0, and (2) making use of cross-fitting which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general ...
format Report
author Victor Chernozhukov
Denis Chetverikov
Mert Demirer
Esther Duflo
Christian Hansen
Whitney Newey
James Robins
spellingShingle Victor Chernozhukov
Denis Chetverikov
Mert Demirer
Esther Duflo
Christian Hansen
Whitney Newey
James Robins
Double/Debiased Machine Learning for Treatment and Structural Parameters
author_facet Victor Chernozhukov
Denis Chetverikov
Mert Demirer
Esther Duflo
Christian Hansen
Whitney Newey
James Robins
author_sort Victor Chernozhukov
title Double/Debiased Machine Learning for Treatment and Structural Parameters
title_short Double/Debiased Machine Learning for Treatment and Structural Parameters
title_full Double/Debiased Machine Learning for Treatment and Structural Parameters
title_fullStr Double/Debiased Machine Learning for Treatment and Structural Parameters
title_full_unstemmed Double/Debiased Machine Learning for Treatment and Structural Parameters
title_sort double/debiased machine learning for treatment and structural parameters
url http://www.nber.org/papers/w23564.pdf
genre DML
genre_facet DML
op_relation http://www.nber.org/papers/w23564.pdf
_version_ 1796308583767343104