Constrained ordination analysis in metagenomics microbial diversity studies

Canonical or constrained correspondence analysis (CCA) is a very popular method for the analysis of species abundance distributions (SAD), particularly when the study objective is to explain differences between the SADs at different sampling sites in terms of local environmental characteristics. The...

Full description

Bibliographic Details
Main Author: Thas, Olivier
Format: Conference Object
Language:English
Published: 2010
Subjects:
Online Access:https://biblio.ugent.be/publication/1853238
http://hdl.handle.net/1854/LU-1853238
Description
Summary:Canonical or constrained correspondence analysis (CCA) is a very popular method for the analysis of species abundance distributions (SAD), particularly when the study objective is to explain differences between the SADs at different sampling sites in terms of local environmental characteristics. These methods have been used successfully for moderately sized studies with several tens of sites and species. Current molecular genomics high throughput sequencing techniques allow estimation of SADs of several tens of thousands of microbial species at each sampling site. A consequence of these deep sequencing results is that the SADs are sparse, in the sense that many microbial species have very small or zero abundances at many sampling sites. Because it is well known that CCA is sensitive to these phenomena, and because CCA depends on restrictive assumptions, there is need for a more appropriate statistical method for this type of metagenomics data. We have developed a constrained ordination technique that can cope with sparse high through- put abundance data. The method is related to the statistical models of Yee (2004, Ecological Monographs, 74(4), pp. 685-701), Zhu et al. (2005, Ecological Modelling, 187, pp. 524-536) and Yee (2006, Ecology, 87(1), pp. 203-213). However, instead of assuming a Poisson model for the abundances, we consider a hurdle model with a truncated Poisson component. We also show how our methods relate to the models of rank abundance distributions (RAD) of Foster and Dunstan (2010, Biometrics, 66, pp. 186-195). The new method is applied to a study on microbial communities in Antarctic lakes. The Roche 454 sequencing technique is used to give SADs of several of thousand microbial species in samples from 50 lakes. The study objective is to estimate the relative importance of environ- mental lake characteristics and of the geographic locations of the lakes in explaining differences between the SADs.