Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite syste...

Full description

Bibliographic Details
Published in:Parasitology
Main Authors: Pérez-Del-Olmo, A., Montero, E. E., Fernández, M., Barrett, J., Raga, J. A., Kostadinova, A. (Aneta)
Format: Article in Journal/Newspaper
Language:English
Published: 2010
Subjects:
Online Access:https://doi.org/10.1017/S0031182010000739
http://hdl.handle.net/11104/0192704
Description
Summary:We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain. The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.