Monothetic cluster analysis with extensions to circular and functional data

Monothetic clustering is a divisive clustering method that uses a hierarchical, recursive partitioning of multivariate responses based on binary decision rules that are built from individual response variables. This clustering technique is helpful for applications where the rules of groupings of obs...

Full description

Bibliographic Details
Main Author: Tran, Tan Vinh
Other Authors: Chairperson, Graduate Committee: Mark Greenwood, Mark C. Greenwood was a co-author of the article, 'Choosing the number of clusters in monothetic cluster analysis' submitted to the journal 'Electronic journal of applied statistical analysis' which is contained within this dissertation., Mark C. Greenwood, John C. Priscu and Marie Sabacka were co-authors of the article, 'Visualization and monothetic clustering data with circular variables' submitted to the journal 'Journal of environmental statistics' which is contained within this dissertation., Mark C. Greenwood was a co-author of the article, 'Clustering on functional data' submitted to the journal 'PeerJ - the journal of life and environmental sciences ' which is contained within this dissertation., Mark C. Greenwood was a co-author of the article, 'Monothetic clustering and partitioning using local subregions: the R packages monoClust and PULS' submitted to the journal 'The journal of open source software' which is contained within this dissertation.
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: Montana State University - Bozeman, College of Letters & Science 2019
Subjects:
Online Access:https://scholarworks.montana.edu/xmlui/handle/1/16369
id ftmontanastateu:oai:scholarworks.montana.edu:1/16369
record_format openpolar
spelling ftmontanastateu:oai:scholarworks.montana.edu:1/16369 2023-05-15T13:54:59+02:00 Monothetic cluster analysis with extensions to circular and functional data Tran, Tan Vinh Chairperson, Graduate Committee: Mark Greenwood Mark C. Greenwood was a co-author of the article, 'Choosing the number of clusters in monothetic cluster analysis' submitted to the journal 'Electronic journal of applied statistical analysis' which is contained within this dissertation. Mark C. Greenwood, John C. Priscu and Marie Sabacka were co-authors of the article, 'Visualization and monothetic clustering data with circular variables' submitted to the journal 'Journal of environmental statistics' which is contained within this dissertation. Mark C. Greenwood was a co-author of the article, 'Clustering on functional data' submitted to the journal 'PeerJ - the journal of life and environmental sciences ' which is contained within this dissertation. Mark C. Greenwood was a co-author of the article, 'Monothetic clustering and partitioning using local subregions: the R packages monoClust and PULS' submitted to the journal 'The journal of open source software' which is contained within this dissertation. Arctic regions Antarctica 2019 application/pdf https://scholarworks.montana.edu/xmlui/handle/1/16369 en eng Montana State University - Bozeman, College of Letters & Science https://scholarworks.montana.edu/xmlui/handle/1/16369 Copyright 2019 by Tan Vinh Tran Cluster analysis Statistics Dissertation 2019 ftmontanastateu 2022-06-06T07:27:17Z Monothetic clustering is a divisive clustering method that uses a hierarchical, recursive partitioning of multivariate responses based on binary decision rules that are built from individual response variables. This clustering technique is helpful for applications where the rules of groupings of observations as well as predicting new subjects into clusters are both important. Based on the ideas of classification and regression trees, a monothetic clustering algorithm was implemented in R to allow further explorations and modifications. One of the common problems in performing clustering is deciding whether a cluster structure is present and, if it is, how many clusters are 'enough'. Some well-established techniques are reviewed as well as new methods based on cross-validation and permutation-based hypothesis tests at each split are suggested. Monothetic clustering is of interest to be applied in a variety of situations. This can include data sets with circular variables, where the variables' natures are not linear. A method for monothetic clustering and visualizations of clusters with circular variables was developed that could also be used in other classification and regression tree situations. Clustering is also interesting for data sets where the responses can be transformed into functional data, which has unique properties that need exploring. Partitioning Using Local Subregions (PULS), a clustering technique inspired by monothetic clustering to overcome some of its disadvantages in clustering functional data, is discussed. In this algorithm, clusters are formed based on aggregating the information from several variables or time intervals. In both monothetic clustering and PULS, it is possible to limit the set of feasible splitting variables to be able to create clusters for new observations without observing all variables or times to assign new observations to the clusters. R packages for these methods have been developed for others to use and test and support the proposed research, and a detailed vignette ... Doctoral or Postdoctoral Thesis Antarc* Antarctica Arctic Montana State University (MSU): ScholarWorks Arctic
institution Open Polar
collection Montana State University (MSU): ScholarWorks
op_collection_id ftmontanastateu
language English
topic Cluster analysis
Statistics
spellingShingle Cluster analysis
Statistics
Tran, Tan Vinh
Monothetic cluster analysis with extensions to circular and functional data
topic_facet Cluster analysis
Statistics
description Monothetic clustering is a divisive clustering method that uses a hierarchical, recursive partitioning of multivariate responses based on binary decision rules that are built from individual response variables. This clustering technique is helpful for applications where the rules of groupings of observations as well as predicting new subjects into clusters are both important. Based on the ideas of classification and regression trees, a monothetic clustering algorithm was implemented in R to allow further explorations and modifications. One of the common problems in performing clustering is deciding whether a cluster structure is present and, if it is, how many clusters are 'enough'. Some well-established techniques are reviewed as well as new methods based on cross-validation and permutation-based hypothesis tests at each split are suggested. Monothetic clustering is of interest to be applied in a variety of situations. This can include data sets with circular variables, where the variables' natures are not linear. A method for monothetic clustering and visualizations of clusters with circular variables was developed that could also be used in other classification and regression tree situations. Clustering is also interesting for data sets where the responses can be transformed into functional data, which has unique properties that need exploring. Partitioning Using Local Subregions (PULS), a clustering technique inspired by monothetic clustering to overcome some of its disadvantages in clustering functional data, is discussed. In this algorithm, clusters are formed based on aggregating the information from several variables or time intervals. In both monothetic clustering and PULS, it is possible to limit the set of feasible splitting variables to be able to create clusters for new observations without observing all variables or times to assign new observations to the clusters. R packages for these methods have been developed for others to use and test and support the proposed research, and a detailed vignette ...
author2 Chairperson, Graduate Committee: Mark Greenwood
Mark C. Greenwood was a co-author of the article, 'Choosing the number of clusters in monothetic cluster analysis' submitted to the journal 'Electronic journal of applied statistical analysis' which is contained within this dissertation.
Mark C. Greenwood, John C. Priscu and Marie Sabacka were co-authors of the article, 'Visualization and monothetic clustering data with circular variables' submitted to the journal 'Journal of environmental statistics' which is contained within this dissertation.
Mark C. Greenwood was a co-author of the article, 'Clustering on functional data' submitted to the journal 'PeerJ - the journal of life and environmental sciences ' which is contained within this dissertation.
Mark C. Greenwood was a co-author of the article, 'Monothetic clustering and partitioning using local subregions: the R packages monoClust and PULS' submitted to the journal 'The journal of open source software' which is contained within this dissertation.
format Doctoral or Postdoctoral Thesis
author Tran, Tan Vinh
author_facet Tran, Tan Vinh
author_sort Tran, Tan Vinh
title Monothetic cluster analysis with extensions to circular and functional data
title_short Monothetic cluster analysis with extensions to circular and functional data
title_full Monothetic cluster analysis with extensions to circular and functional data
title_fullStr Monothetic cluster analysis with extensions to circular and functional data
title_full_unstemmed Monothetic cluster analysis with extensions to circular and functional data
title_sort monothetic cluster analysis with extensions to circular and functional data
publisher Montana State University - Bozeman, College of Letters & Science
publishDate 2019
url https://scholarworks.montana.edu/xmlui/handle/1/16369
op_coverage Arctic regions
Antarctica
geographic Arctic
geographic_facet Arctic
genre Antarc*
Antarctica
Arctic
genre_facet Antarc*
Antarctica
Arctic
op_relation https://scholarworks.montana.edu/xmlui/handle/1/16369
op_rights Copyright 2019 by Tan Vinh Tran
_version_ 1766261200807526400