Cluster analysis for symbolic interval data using linear regression method

Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical...

Full description

Bibliographic Details
Main Author: Liu, Fei
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: uga 2016
Subjects:
Online Access:http://hdl.handle.net/10724/36237
http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd
id ftunivgeorgia:oai:athenaeum.libs.uga.edu:10724/36237
record_format openpolar
spelling ftunivgeorgia:oai:athenaeum.libs.uga.edu:10724/36237 2023-05-15T17:53:29+02:00 Cluster analysis for symbolic interval data using linear regression method Liu, Fei 2016-05 http://hdl.handle.net/10724/36237 http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd eng eng uga liu_fei_201605_phd http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd http://hdl.handle.net/10724/36237 On Campus Only Until 2018-05-01 Symbolic data analysis Cluster analysis Interval-valued data Linear regression Orthogonal regression Measurement error model Dissertation 2016 ftunivgeorgia 2020-09-24T10:07:27Z Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical clusters. It usually cannot converge to the correct clusters if the data are not clustering spherically. We propose a K-regressions based clustering method for interval-valued data to recover a more complicated data structure. Assuming the response and predictor variables follow K di erent linear relationships, the data are initially split into K groups randomly. Then, we apply the new developed symbolic variation" least squares to estimate the parameters of the K symbolic regressions. A data point is then relocated to its closest group in terms of its symbolic distance to the regression lines. This two-step dynamic clustering algorithm continues until the clusters are stable. Further, we introduce an orthogonal regression clustering algorithm (ORCA) for interval-value data to avoid specifying a response variable. Two orthogonal regression methods are proposed: the simple orthogonal regression method and the general orthogonal regression method. We utilize four di erent methods to determine the optimal number of clusters. Simulation study is conducted to investigate the performance of the ORCA algorithm. We use the Iris data (Fisher, 1936) to test the e ectiveness of the ORCA algorithm. PhD Statistics Statistics Lynne Billard Lynne Billard Paul Schliekelman Jaxk Reeves William McCormick Pengsheng Ji Doctoral or Postdoctoral Thesis Orca University of Georgia: Athenaeum@UGA McCormick ENVELOPE(170.967,170.967,-71.833,-71.833) Reeves ENVELOPE(-67.983,-67.983,-67.133,-67.133)
institution Open Polar
collection University of Georgia: Athenaeum@UGA
op_collection_id ftunivgeorgia
language English
topic Symbolic data analysis
Cluster analysis
Interval-valued data
Linear regression
Orthogonal regression
Measurement error model
spellingShingle Symbolic data analysis
Cluster analysis
Interval-valued data
Linear regression
Orthogonal regression
Measurement error model
Liu, Fei
Cluster analysis for symbolic interval data using linear regression method
topic_facet Symbolic data analysis
Cluster analysis
Interval-valued data
Linear regression
Orthogonal regression
Measurement error model
description Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical clusters. It usually cannot converge to the correct clusters if the data are not clustering spherically. We propose a K-regressions based clustering method for interval-valued data to recover a more complicated data structure. Assuming the response and predictor variables follow K di erent linear relationships, the data are initially split into K groups randomly. Then, we apply the new developed symbolic variation" least squares to estimate the parameters of the K symbolic regressions. A data point is then relocated to its closest group in terms of its symbolic distance to the regression lines. This two-step dynamic clustering algorithm continues until the clusters are stable. Further, we introduce an orthogonal regression clustering algorithm (ORCA) for interval-value data to avoid specifying a response variable. Two orthogonal regression methods are proposed: the simple orthogonal regression method and the general orthogonal regression method. We utilize four di erent methods to determine the optimal number of clusters. Simulation study is conducted to investigate the performance of the ORCA algorithm. We use the Iris data (Fisher, 1936) to test the e ectiveness of the ORCA algorithm. PhD Statistics Statistics Lynne Billard Lynne Billard Paul Schliekelman Jaxk Reeves William McCormick Pengsheng Ji
format Doctoral or Postdoctoral Thesis
author Liu, Fei
author_facet Liu, Fei
author_sort Liu, Fei
title Cluster analysis for symbolic interval data using linear regression method
title_short Cluster analysis for symbolic interval data using linear regression method
title_full Cluster analysis for symbolic interval data using linear regression method
title_fullStr Cluster analysis for symbolic interval data using linear regression method
title_full_unstemmed Cluster analysis for symbolic interval data using linear regression method
title_sort cluster analysis for symbolic interval data using linear regression method
publisher uga
publishDate 2016
url http://hdl.handle.net/10724/36237
http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd
long_lat ENVELOPE(170.967,170.967,-71.833,-71.833)
ENVELOPE(-67.983,-67.983,-67.133,-67.133)
geographic McCormick
Reeves
geographic_facet McCormick
Reeves
genre Orca
genre_facet Orca
op_relation liu_fei_201605_phd
http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd
http://hdl.handle.net/10724/36237
op_rights On Campus Only Until 2018-05-01
_version_ 1766161192870477824