Using PPCA to estimate EOFS in the presence of missing values
One of the problems encountered when using satellite-derived sea surface temperature (SST) data is the impossibility of retrieving data where the ocean surface is obscured by cloud. Empirical orthogonal function (EOF) analysis cannot be carried out easily when there are missing values within the dat...
Main Authors: | , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
2004
|
Subjects: | |
Online Access: | http://nora.nerc.ac.uk/id/eprint/109645/ http://ams.allenpress.com/amsonline/?request=get-abstract&issn=1520-0426&volume=021&issue=09&page=1471 https://doi.org/10.1175/1520-0426(2004)021<1471:UPTEEI>2.0.CO;2 |
Summary: | One of the problems encountered when using satellite-derived sea surface temperature (SST) data is the impossibility of retrieving data where the ocean surface is obscured by cloud. Empirical orthogonal function (EOF) analysis cannot be carried out easily when there are missing values within the dataset. One possible solution is to interpolate using the existing data. In this paper an alternative technique is investigated, probabilistic principal component analysis (PPCA), and applied to calculate the principal EOFs of North Atlantic SSTs. This analysis uses results obtained from interpolating the SST data using a simplified Kalman filter, with data randomly removed to simulate missing values, and then reconstructs the data using PPCA, obtaining the principal EOFs. The calculation of the EOFs was quicker than traditional EOF analysis, as the covariance matrix was estimated rather than calculated. The replacement of missing values was also computationally more efficient than using the Kalman filter, taking a fraction of the time. The expectation–maximization (EM) algorithm produced similar results to those produced through standard procedures. However, the choice of the number of EOFs to be retained had a significant effect on the accuracy of the interpolated dataset, with more EOFs reducing the accuracy of the reconstructed dataset. |
---|