Outlier accommodation with semiparametric density processes: A study of Antarctic snow density modelling

In many settings, data acquisition generates outliers that can obscure inference. Therefore, practitioners often either identify and remove outliers or accommodate outliers using robust models. However, identifying and removing outliers is often an ad hoc process that affects inference, and robust m...

Full description

Bibliographic Details
Published in:Statistical Modelling
Main Authors: Sheanshang, Daniel M., White, Philip A., Keeler, Durban G.
Format: Article in Journal/Newspaper
Language:English
Published: SAGE Publications 2021
Subjects:
Online Access:http://dx.doi.org/10.1177/1471082x211043946
http://journals.sagepub.com/doi/pdf/10.1177/1471082X211043946
http://journals.sagepub.com/doi/full-xml/10.1177/1471082X211043946
Description
Summary:In many settings, data acquisition generates outliers that can obscure inference. Therefore, practitioners often either identify and remove outliers or accommodate outliers using robust models. However, identifying and removing outliers is often an ad hoc process that affects inference, and robust methods are often too simple for some applications. In our motivating application, scientists drill snow cores and measure snow density to infer densification rates that aid in estimating snow water accumulation rates and glacier mass balances. Advanced measurement techniques can measure density at high resolution over depth but are sensitive to core imperfections, making them prone to outliers. Outlier accommodation is challenging in this setting because the distribution of outliers evolves over depth and the data demonstrate natural heteroscedasticity. To address these challenges, we present a two-component mixture model using a physically motivated snow density model and an outlier model, both of which evolve over depth. The physical component of the mixture model has a mean function with normally distributed depth-dependent heteroscedastic errors. The outlier component is specified using a semiparametric prior density process constructed through a normalized process convolution of log-normal random variables. We demonstrate that this model outperforms alternatives and can be used for various inferential tasks.