INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS

Marine microbial communities are complex, and represent a serious analytical challenge. The Bayesian model for inference of microbial community structure (BioMiCo) was used to characterize microbial populations using 16S rRNA within polar, tropical, and temperate environmental zones. Global-scale an...

Full description

Bibliographic Details
Main Author: Bashwih, Rana Omer
Other Authors: Department of Computational Biology and Bioinformatics, Master of Science, n/a, Dr. Robert Beiko, Dr. Robert Lee, Dr.Morgan Langille, Dr. Joseph Bielawski, Dr. Hong Gu, Not Applicable
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10222/72144
id ftdalhouse:oai:DalSpace.library.dal.ca:10222/72144
record_format openpolar
spelling ftdalhouse:oai:DalSpace.library.dal.ca:10222/72144 2023-05-15T13:42:32+02:00 INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS Bashwih, Rana Omer Department of Computational Biology and Bioinformatics Master of Science n/a Dr. Robert Beiko Dr. Robert Lee Dr.Morgan Langille Dr. Joseph Bielawski Dr. Hong Gu Not Applicable 2016-08-31T13:28:46Z http://hdl.handle.net/10222/72144 en eng http://hdl.handle.net/10222/72144 2016 ftdalhouse 2022-03-06T00:10:10Z Marine microbial communities are complex, and represent a serious analytical challenge. The Bayesian model for inference of microbial community structure (BioMiCo) was used to characterize microbial populations using 16S rRNA within polar, tropical, and temperate environmental zones. Global-scale and local analyses were performed on 356 microbial samples and 72853 OTUs within the ICOMM database. Global analysis showed that polar and tropical zones had distinct community structures with high predictive value and little seasonal variation, although seasonal variation was noticeable in the temperate zone. Local analysis on polar communities demonstrated that there were distinct community structures for the Arctic and Antarctic zones. Within the North Atlantic, temporal heterogeneity differed locally, and this impeded the predictive value of models for the entire North Atlantic. Training a model on a single, well-sampled, North Atlantic site, L4 in the English Channel, substantially improved the predictive value of the model. Finally, the model for the L4 site had predictive value for other English Channel sites, but not for more distant sites within the western and eastern North Atlantic. This result appears to be due to differences among North Atlantic sites in the timing of their seasonal community transitions, and because most other sites have not been nearly as well sampled as the L4 site. The only other well-sampled site in the North Atlantic (Bedford Basin) also exhibits regular seasonal transitiona from year to year. Taken together, these results suggest that environmental changes are the primary drivers of marine biogeographic patterns within the North Atlantic. Four methodological investigations were applied to Arctic and Antarctic samples, and to the samples from L4 station in the English Channel, for the purpose of exploring the impact of how users might choose to make inferences using BioMiCo. The first was an exploration of different ways of defining the predominant OTUs within an assemblage. The size of the assemblage was very sensitive to the method. I recommend defining predominant OTUs as those having >0.01 posterior probability, as this was the most conservative. The second was an exploration of the impact of “burn-in”. As expected, increasing burin-in yielded more stable assemblages; however, the burn-in did not need to exceed 1000 iterations. The third was an exploration the effect of training and testing design on prediction of Arctic and Antarctic samples. The results showed that better predictions were obtained from larger training sets of data. However, training on more than 2/3 of the data did not generate significant improvement. Thus, designs such as leave-one-out cross validation can be reserved for cases where the total sample size is very small. Otherwise, uses should run several replicates on data randomly divided into 2/3 training sets and 1/3 test sets. The fourth explored the effect of pre-specifying different numbers of assemblages (the value of L within the model). The results showed that running 25 communities was sufficient. In conclusion, the choices that users make when running the MCMC can impact their results, but, the approach is robust and good results can be obtained with just L=25 if the training data is of a sufficient size, and if a sufficient amount of burn-in is discarded. Other/Unknown Material Antarc* Antarctic Arctic North Atlantic Dalhousie University: DalSpace Institutional Repository Antarctic Arctic Bedford ENVELOPE(-67.150,-67.150,-66.467,-66.467)
institution Open Polar
collection Dalhousie University: DalSpace Institutional Repository
op_collection_id ftdalhouse
language English
description Marine microbial communities are complex, and represent a serious analytical challenge. The Bayesian model for inference of microbial community structure (BioMiCo) was used to characterize microbial populations using 16S rRNA within polar, tropical, and temperate environmental zones. Global-scale and local analyses were performed on 356 microbial samples and 72853 OTUs within the ICOMM database. Global analysis showed that polar and tropical zones had distinct community structures with high predictive value and little seasonal variation, although seasonal variation was noticeable in the temperate zone. Local analysis on polar communities demonstrated that there were distinct community structures for the Arctic and Antarctic zones. Within the North Atlantic, temporal heterogeneity differed locally, and this impeded the predictive value of models for the entire North Atlantic. Training a model on a single, well-sampled, North Atlantic site, L4 in the English Channel, substantially improved the predictive value of the model. Finally, the model for the L4 site had predictive value for other English Channel sites, but not for more distant sites within the western and eastern North Atlantic. This result appears to be due to differences among North Atlantic sites in the timing of their seasonal community transitions, and because most other sites have not been nearly as well sampled as the L4 site. The only other well-sampled site in the North Atlantic (Bedford Basin) also exhibits regular seasonal transitiona from year to year. Taken together, these results suggest that environmental changes are the primary drivers of marine biogeographic patterns within the North Atlantic. Four methodological investigations were applied to Arctic and Antarctic samples, and to the samples from L4 station in the English Channel, for the purpose of exploring the impact of how users might choose to make inferences using BioMiCo. The first was an exploration of different ways of defining the predominant OTUs within an assemblage. The size of the assemblage was very sensitive to the method. I recommend defining predominant OTUs as those having >0.01 posterior probability, as this was the most conservative. The second was an exploration of the impact of “burn-in”. As expected, increasing burin-in yielded more stable assemblages; however, the burn-in did not need to exceed 1000 iterations. The third was an exploration the effect of training and testing design on prediction of Arctic and Antarctic samples. The results showed that better predictions were obtained from larger training sets of data. However, training on more than 2/3 of the data did not generate significant improvement. Thus, designs such as leave-one-out cross validation can be reserved for cases where the total sample size is very small. Otherwise, uses should run several replicates on data randomly divided into 2/3 training sets and 1/3 test sets. The fourth explored the effect of pre-specifying different numbers of assemblages (the value of L within the model). The results showed that running 25 communities was sufficient. In conclusion, the choices that users make when running the MCMC can impact their results, but, the approach is robust and good results can be obtained with just L=25 if the training data is of a sufficient size, and if a sufficient amount of burn-in is discarded.
author2 Department of Computational Biology and Bioinformatics
Master of Science
n/a
Dr. Robert Beiko
Dr. Robert Lee
Dr.Morgan Langille
Dr. Joseph Bielawski
Dr. Hong Gu
Not Applicable
author Bashwih, Rana Omer
spellingShingle Bashwih, Rana Omer
INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
author_facet Bashwih, Rana Omer
author_sort Bashwih, Rana Omer
title INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
title_short INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
title_full INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
title_fullStr INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
title_full_unstemmed INFERENCE AND INVESTIGATION OF MARINE MICROBIAL COMMUNITY STRUCTURES IN THE GLOBAL OCEANS
title_sort inference and investigation of marine microbial community structures in the global oceans
publishDate 2016
url http://hdl.handle.net/10222/72144
long_lat ENVELOPE(-67.150,-67.150,-66.467,-66.467)
geographic Antarctic
Arctic
Bedford
geographic_facet Antarctic
Arctic
Bedford
genre Antarc*
Antarctic
Arctic
North Atlantic
genre_facet Antarc*
Antarctic
Arctic
North Atlantic
op_relation http://hdl.handle.net/10222/72144
_version_ 1766169167465021440