Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity

International audience We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, i.e. community data where observations at different sites are classifie...

Full description

Bibliographic Details
Published in:The Annals of Applied Statistics
Main Authors: Arbel, Julyan, Mengersen, Kerrie, L., Rousseau, Judith
Other Authors: Modelling and Inference of Complex and Structured Stochastic Systems (MISTIS ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes 2016-2019 (UGA 2016-2019 )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes 2016-2019 (UGA 2016-2019 ), Collegio Carlo Alberto, Università degli studi di Torino = University of Turin (UNITO), Queensland University of Technology Brisbane (QUT), CEntre de REcherches en MAthématiques de la DEcision (CEREMADE), Université Paris Dauphine-PSL, Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Économie et Statistique (CREST), Ecole Nationale de la Statistique et de l'Analyse de l'Information Bruz (ENSAI)-École polytechnique (X)-École Nationale de la Statistique et de l'Administration Économique (ENSAE Paris)-Centre National de la Recherche Scientifique (CNRS)
Format: Article in Journal/Newspaper
Language:English
Published: HAL CCSD 2016
Subjects:
Online Access:https://hal.science/hal-01203345
https://hal.science/hal-01203345/document
https://hal.science/hal-01203345/file/methods.pdf
https://doi.org/10.1214/16-AOAS944
Description
Summary:International audience We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, i.e. community data where observations at different sites are classified in distinct species. Our aim is to study the impact of additional covariates, for instance environmental variables, on the data structure, and in particular on the community diversity. To that purpose, we introduce dependence a priori across the covariates, and show that it improves posterior inference. We use a dependent version of the Griffiths-Engen-McCloskey distribution defined via the stick-breaking construction. This distribution is obtained by transforming a Gaussian process whose covariance function controls the desired dependence. The resulting posterior distribution is sampled by Markov chain Monte Carlo. We illustrate the application of our model to a soil microbial dataset acquired across a hydrocarbon contamination gradient at the site of a fuel spill in Antarctica. This method allows for inference on a number of quantities of interest in ecotoxicology, such as diversity or effective concentrations, and is broadly applicable to the general problem of communities response to environmental variables.