Adaptive Basis Density Estimation for High-Dimensional Data

All high-dimensional density estimation techniques must make some assumptions about the underlying data distribution in order to be practical. In this proposal, I present work on a new method for high dimensional density estimation which assumes the ability to cheaply sample from an instrumental dis...

Full description

Bibliographic Details
Main Author: Susan Buchman
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2010
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.210.6128
Description
Summary:All high-dimensional density estimation techniques must make some assumptions about the underlying data distribution in order to be practical. In this proposal, I present work on a new method for high dimensional density estimation which assumes the ability to cheaply sample from an instrumental distribution which captures the low-dimensional structure in the data distribution. This assumption is satisfied in the application area of interest: modeling the distribution of tracks of tropical cyclones (TC) in the North Atlantic Ocean. Physical models are capable of generating realistic tracks, but not in the correct distribution over track space; my method allows for their use as instrumental distributions, anchoring the observed data in the vast high-dimensional space. Using orthogonal series density estimation with a basis that is adapted to the instrumental distribution, I produce a density for the data distribution with respect not to the Lebesgue measure, but with respect to the instrumental distribution, which has the potential to improve the rates of convergence of quantities of interest. Initial simulations support this hypothesis. I propose to extend this work to conditional density estimation to allow for the introduction of covariates, which when applied to the TC track data will reveal the relationship between spatial locations of TCs and climatic predictors. Furthermore, I will explore plug-in criteria for choosing optimal truncation points of the series, and for validating high-dimensional density estimates. I will establish consistency results for the procedures. 1