Cluster analysis for symbolic interval data using linear regression method
Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical...
Main Author: | |
---|---|
Format: | Doctoral or Postdoctoral Thesis |
Language: | English |
Published: |
uga
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10724/36237 http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd |
id |
ftunivgeorgia:oai:athenaeum.libs.uga.edu:10724/36237 |
---|---|
record_format |
openpolar |
spelling |
ftunivgeorgia:oai:athenaeum.libs.uga.edu:10724/36237 2023-05-15T17:53:29+02:00 Cluster analysis for symbolic interval data using linear regression method Liu, Fei 2016-05 http://hdl.handle.net/10724/36237 http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd eng eng uga liu_fei_201605_phd http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd http://hdl.handle.net/10724/36237 On Campus Only Until 2018-05-01 Symbolic data analysis Cluster analysis Interval-valued data Linear regression Orthogonal regression Measurement error model Dissertation 2016 ftunivgeorgia 2020-09-24T10:07:27Z Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical clusters. It usually cannot converge to the correct clusters if the data are not clustering spherically. We propose a K-regressions based clustering method for interval-valued data to recover a more complicated data structure. Assuming the response and predictor variables follow K di erent linear relationships, the data are initially split into K groups randomly. Then, we apply the new developed symbolic variation" least squares to estimate the parameters of the K symbolic regressions. A data point is then relocated to its closest group in terms of its symbolic distance to the regression lines. This two-step dynamic clustering algorithm continues until the clusters are stable. Further, we introduce an orthogonal regression clustering algorithm (ORCA) for interval-value data to avoid specifying a response variable. Two orthogonal regression methods are proposed: the simple orthogonal regression method and the general orthogonal regression method. We utilize four di erent methods to determine the optimal number of clusters. Simulation study is conducted to investigate the performance of the ORCA algorithm. We use the Iris data (Fisher, 1936) to test the e ectiveness of the ORCA algorithm. PhD Statistics Statistics Lynne Billard Lynne Billard Paul Schliekelman Jaxk Reeves William McCormick Pengsheng Ji Doctoral or Postdoctoral Thesis Orca University of Georgia: Athenaeum@UGA McCormick ENVELOPE(170.967,170.967,-71.833,-71.833) Reeves ENVELOPE(-67.983,-67.983,-67.133,-67.133) |
institution |
Open Polar |
collection |
University of Georgia: Athenaeum@UGA |
op_collection_id |
ftunivgeorgia |
language |
English |
topic |
Symbolic data analysis Cluster analysis Interval-valued data Linear regression Orthogonal regression Measurement error model |
spellingShingle |
Symbolic data analysis Cluster analysis Interval-valued data Linear regression Orthogonal regression Measurement error model Liu, Fei Cluster analysis for symbolic interval data using linear regression method |
topic_facet |
Symbolic data analysis Cluster analysis Interval-valued data Linear regression Orthogonal regression Measurement error model |
description |
Symbolic data records are becoming a more powerful instrument to deal with large size data sets. Interval-valued data are a special type of symbolic data, for which each observation is a vector of intervals. The typical K-means methods for interval-valued data suppose the data separate to spherical clusters. It usually cannot converge to the correct clusters if the data are not clustering spherically. We propose a K-regressions based clustering method for interval-valued data to recover a more complicated data structure. Assuming the response and predictor variables follow K di erent linear relationships, the data are initially split into K groups randomly. Then, we apply the new developed symbolic variation" least squares to estimate the parameters of the K symbolic regressions. A data point is then relocated to its closest group in terms of its symbolic distance to the regression lines. This two-step dynamic clustering algorithm continues until the clusters are stable. Further, we introduce an orthogonal regression clustering algorithm (ORCA) for interval-value data to avoid specifying a response variable. Two orthogonal regression methods are proposed: the simple orthogonal regression method and the general orthogonal regression method. We utilize four di erent methods to determine the optimal number of clusters. Simulation study is conducted to investigate the performance of the ORCA algorithm. We use the Iris data (Fisher, 1936) to test the e ectiveness of the ORCA algorithm. PhD Statistics Statistics Lynne Billard Lynne Billard Paul Schliekelman Jaxk Reeves William McCormick Pengsheng Ji |
format |
Doctoral or Postdoctoral Thesis |
author |
Liu, Fei |
author_facet |
Liu, Fei |
author_sort |
Liu, Fei |
title |
Cluster analysis for symbolic interval data using linear regression method |
title_short |
Cluster analysis for symbolic interval data using linear regression method |
title_full |
Cluster analysis for symbolic interval data using linear regression method |
title_fullStr |
Cluster analysis for symbolic interval data using linear regression method |
title_full_unstemmed |
Cluster analysis for symbolic interval data using linear regression method |
title_sort |
cluster analysis for symbolic interval data using linear regression method |
publisher |
uga |
publishDate |
2016 |
url |
http://hdl.handle.net/10724/36237 http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd |
long_lat |
ENVELOPE(170.967,170.967,-71.833,-71.833) ENVELOPE(-67.983,-67.983,-67.133,-67.133) |
geographic |
McCormick Reeves |
geographic_facet |
McCormick Reeves |
genre |
Orca |
genre_facet |
Orca |
op_relation |
liu_fei_201605_phd http://purl.galileo.usg.edu/uga_etd/liu_fei_201605_phd http://hdl.handle.net/10724/36237 |
op_rights |
On Campus Only Until 2018-05-01 |
_version_ |
1766161192870477824 |