A Simulation Study to Evaluate Bayesian LASSO’s Performance in Zero-Inflated Poisson (ZIP) Models

When modelling count data, it is possible to have excessive zeros in the data in many applications. My thesis concentrates on the variable selection in zero-inflated Poisson (ZIP) models. This thesis work is motivated by Brown et al. (2015), who considered the excessive amount of zero in their data...

Full description

Bibliographic Details
Main Author: Dong, Yue
Other Authors: Liu, Juxin, Li, Longhai, Sowa, Artur, Lamb, Eric
Format: Thesis
Language:unknown
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10388/7313
Description
Summary:When modelling count data, it is possible to have excessive zeros in the data in many applications. My thesis concentrates on the variable selection in zero-inflated Poisson (ZIP) models. This thesis work is motivated by Brown et al. (2015), who considered the excessive amount of zero in their data structure and the site-specific random effects, and used Bayesian LASSO method for variable selection in their post-fire tree recruitment study in interior Alaska, USA and north Yukon, Canada. However, the above study has not carried out systematic simulation studies to evaluate Bayesian LASSO’s performance under different scenarios. Therefore, my thesis conducts a series of simulation studies to evaluate Bayesian LASSO’s performance with respect to different setting of some simulation factors. My thesis considers three simulation factors: the number of subjects (N), the number of repeated measurements (R) and the true values of regression coefficients in the ZIP models. With different settings of the three factors, the proposed Bayesian LASSO’s performance would be evaluated using three indicators: the sensitivity, the specificity and the exact fit rate. For applied practitioners, my thesis would be a useful example demonstrating under what circumstances one can expect Bayesian LASSO to have good performance in ZIP models. After sorting out the simulation results, we can find that Bayesian LASSO’s performance is jointly affected by all the three simulation factors, while this method of variable selection is more reliable when the true coefficients are not close to zero. My thesis also has some limitations. Primarily, with the time limitation of my thesis, it is impossible to consider all the factors that can potentially affect the simulation results, and using other penalty forms other than L1 penalty is also left for future researchers to work on. Moreover, the current variable selection method is only for fixed effects selection while the variable selection for the mixed effect selection in ZIP models can be a ...