An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings

The machine-learning algorithms have gained popularity and have gotten the attention of many researchers in the fields of statistics and computer sciences in recent decades. Due to their computational capabilities in big data, many researchers have been attempting to incorporate machine-learning in...

Full description

Bibliographic Details
Main Author: Alanazi, Sami Saad
Format: Text
Language:unknown
Published: Scholarship & Creative Works @ Digital UNC 2022
Subjects:
DML
Online Access:https://digscholarship.unco.edu/dissertations/919
https://digscholarship.unco.edu/context/dissertations/article/1874/viewcontent/Alanazi_unco_0161D_11061.pdf
id ftuninorthcoloir:oai:digscholarship.unco.edu:dissertations-1874
record_format openpolar
spelling ftuninorthcoloir:oai:digscholarship.unco.edu:dissertations-1874 2023-11-12T04:16:26+01:00 An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings Alanazi, Sami Saad 2022-12-01T08:00:00Z application/pdf https://digscholarship.unco.edu/dissertations/919 https://digscholarship.unco.edu/context/dissertations/article/1874/viewcontent/Alanazi_unco_0161D_11061.pdf unknown Scholarship & Creative Works @ Digital UNC https://digscholarship.unco.edu/dissertations/919 https://digscholarship.unco.edu/context/dissertations/article/1874/viewcontent/Alanazi_unco_0161D_11061.pdf Dissertations text 2022 ftuninorthcoloir 2023-10-30T09:45:12Z The machine-learning algorithms have gained popularity and have gotten the attention of many researchers in the fields of statistics and computer sciences in recent decades. Due to their computational capabilities in big data, many researchers have been attempting to incorporate machine-learning in prediction and inference problems. One of the recent methods that got a lot of attentions was referred to as the double machine learning method (DML). This method attempts to estimate the effect of the treatment variable in the presence of high-dimensional nuisance function by incorporating machine-learning algorithms. Previous studies have shown that the DML method is able to reduce the bias in estimating the targeted parameter when many covariates are present in the dataset. In this dissertation, a method was proposed that is referred to as the double super learner method (DSL). Since there are many machine-learning algorithms in existence today that are different in their searching strategy, there is no way to know which algorithm performs best for a given dataset. The proposed DSL method was developed in parallel with the DML method and works by incorporating several machine-learning algorithms via the super learner function. Numerical simulation was performed across various data settings in terms of the sample, the number of associated covariates, and the type of treatment variable. In comparison with the original DML method, numerical simulation showed that the proposed method achieved reduction in bias and provided valid confidence intervals in situations where the original method did not. A package called DoubleSL was then developed and made public for those who desire to use this method in their research. In addition, real-data examples were included in the package to demonstrate the use of this method. Text DML Scholarship & Creative Works @ Digital UNC (University of Northern Colorado)
institution Open Polar
collection Scholarship & Creative Works @ Digital UNC (University of Northern Colorado)
op_collection_id ftuninorthcoloir
language unknown
description The machine-learning algorithms have gained popularity and have gotten the attention of many researchers in the fields of statistics and computer sciences in recent decades. Due to their computational capabilities in big data, many researchers have been attempting to incorporate machine-learning in prediction and inference problems. One of the recent methods that got a lot of attentions was referred to as the double machine learning method (DML). This method attempts to estimate the effect of the treatment variable in the presence of high-dimensional nuisance function by incorporating machine-learning algorithms. Previous studies have shown that the DML method is able to reduce the bias in estimating the targeted parameter when many covariates are present in the dataset. In this dissertation, a method was proposed that is referred to as the double super learner method (DSL). Since there are many machine-learning algorithms in existence today that are different in their searching strategy, there is no way to know which algorithm performs best for a given dataset. The proposed DSL method was developed in parallel with the DML method and works by incorporating several machine-learning algorithms via the super learner function. Numerical simulation was performed across various data settings in terms of the sample, the number of associated covariates, and the type of treatment variable. In comparison with the original DML method, numerical simulation showed that the proposed method achieved reduction in bias and provided valid confidence intervals in situations where the original method did not. A package called DoubleSL was then developed and made public for those who desire to use this method in their research. In addition, real-data examples were included in the package to demonstrate the use of this method.
format Text
author Alanazi, Sami Saad
spellingShingle Alanazi, Sami Saad
An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
author_facet Alanazi, Sami Saad
author_sort Alanazi, Sami Saad
title An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
title_short An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
title_full An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
title_fullStr An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
title_full_unstemmed An Ensemble Machine Learning Approach To Causal Inference in High-Dimensional Settings
title_sort ensemble machine learning approach to causal inference in high-dimensional settings
publisher Scholarship & Creative Works @ Digital UNC
publishDate 2022
url https://digscholarship.unco.edu/dissertations/919
https://digscholarship.unco.edu/context/dissertations/article/1874/viewcontent/Alanazi_unco_0161D_11061.pdf
genre DML
genre_facet DML
op_source Dissertations
op_relation https://digscholarship.unco.edu/dissertations/919
https://digscholarship.unco.edu/context/dissertations/article/1874/viewcontent/Alanazi_unco_0161D_11061.pdf
_version_ 1782333520578871296