A new procedure to optimize the selection of groups in a classification tree

Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present...

Full description

Bibliographic Details
Main Author: Guidi, L
Format: Article in Journal/Newspaper
Language:unknown
Published: 2009
Subjects:
Online Access:http://plymsea.ac.uk/id/eprint/5750/
id ftplymouthml:oai:plymsea.ac.uk:5750
record_format openpolar
spelling ftplymouthml:oai:plymsea.ac.uk:5750 2023-05-15T17:33:51+02:00 A new procedure to optimize the selection of groups in a classification tree Guidi, L 2009 http://plymsea.ac.uk/id/eprint/5750/ unknown Guidi, L. 2009 A new procedure to optimize the selection of groups in a classification tree. Ecological Modelling, 220 (4). 451-461. Publication - Article PeerReviewed 2009 ftplymouthml 2022-09-13T05:48:23Z Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets. Article in Journal/Newspaper North Atlantic Copepods Plymouth Marine Science Electronic Archive (PlyMSEA - Plymouth Marine Laboratory, PML) Lindley ENVELOPE(159.100,159.100,-81.767,-81.767)
institution Open Polar
collection Plymouth Marine Science Electronic Archive (PlyMSEA - Plymouth Marine Laboratory, PML)
op_collection_id ftplymouthml
language unknown
description Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets.
format Article in Journal/Newspaper
author Guidi, L
spellingShingle Guidi, L
A new procedure to optimize the selection of groups in a classification tree
author_facet Guidi, L
author_sort Guidi, L
title A new procedure to optimize the selection of groups in a classification tree
title_short A new procedure to optimize the selection of groups in a classification tree
title_full A new procedure to optimize the selection of groups in a classification tree
title_fullStr A new procedure to optimize the selection of groups in a classification tree
title_full_unstemmed A new procedure to optimize the selection of groups in a classification tree
title_sort new procedure to optimize the selection of groups in a classification tree
publishDate 2009
url http://plymsea.ac.uk/id/eprint/5750/
long_lat ENVELOPE(159.100,159.100,-81.767,-81.767)
geographic Lindley
geographic_facet Lindley
genre North Atlantic
Copepods
genre_facet North Atlantic
Copepods
op_relation Guidi, L. 2009 A new procedure to optimize the selection of groups in a classification tree. Ecological Modelling, 220 (4). 451-461.
_version_ 1766132485159124992