A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k -means based algorithm is deve...

Full description

Bibliographic Details
Main Authors: Semeh Ben Salem, Naouali, Sami, Moetez Sallami
Format: Text
Language:English
Published: Zenodo 2017
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.1130685
https://zenodo.org/record/1130685
id ftdatacite:10.5281/zenodo.1130685
record_format openpolar
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
topic Pattern recognition
partitional clustering
K-means clustering
Manhattan distance
terrorism data analysis.
spellingShingle Pattern recognition
partitional clustering
K-means clustering
Manhattan distance
terrorism data analysis.
Semeh Ben Salem
Naouali, Sami
Moetez Sallami
A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
topic_facet Pattern recognition
partitional clustering
K-means clustering
Manhattan distance
terrorism data analysis.
description The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k -means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k -means version. : {"references": ["De Bruin, J. S, Cocx, T. K, Kosters, W. A, Laros, \"Data Mining approaches to criminal career analysis.\" In Proceedings of the 6th International Conference on Data Mining ICDM'06, pp 11-18, 2006.", "T. Abraham and O. de Vel, \"Investigating profiling with computer forensic log data and associations rules.\" Proceedings of the IEEE International Conference on Data Mining (ICDM'06), pp 11-18, 2006.", "Jiawei Han M. K, \"Data Mining concepts and techniques.\" Morgan Kaufmann Publishers, An Imprint of Elsevier, 2006.", "Huang Z, \"Extension to the k-means algorithm for clustering large datasets with categorical values\", Data Mining and Knowledge Discovery, (2):283-304, 1998.", "Amir Ahmad, Lipika Dey, \"A k-means clustering algorithm for mixed numeric and categorical data.\" Data and Knowledge Engineering 63, pp 503-527, 2007.", "V. Ganti, J. E Gekhre, R. Ramakrishnan, \"CACTUS clustering categorical data using summaries\", Proceedings of the 5th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 1999, pp 73-83.", "T. Zhang, R. Ramakrishnan, M. Livny, \"BIRCH: an efficient data clustering method for very large databases.\" SIGMOD Conference, 1996, pp 130-114.", "Dong Kuan Xu, Yingjie Tian, \"A Comprehensive Survey of Clustering Algorithms\", Ann. Data. Sci. , Springer-Verlag Berlin Heidelberg 2015, DOI 10.1007/s40745-015-0040-1", "Celebi M E, Kingravi H A Vela P A, \"A comparative study of efficient initialization methods for the k-means clustering algorithm\". Expert Systems with Applications 40:200\u2013210, 2013.\n[10]\tCelebi M E, Kingravi H, \"Deterministic initialization of the K-means algorithm using hierarchical clustering\", International Journal of Pattern Recognition and Artificial Intelligence 26(7):1250018, 2012.\n[11]\tCelebi M E, Kingravi H, \"Linear, deterministic, and order-invariant initialization methods for the K-means clustering algorithm.'' Celebi M E (ed) Partitional clustering algorithms. Springer, Berlin, pp 79\u201398, 2014.\n[12]\tKalogeratos A, Likas A, \"Dip-means: an incremental clustering method for estimating the number of clusters.\" In: Advances in neural information processing systems (NIPS), pp 2402\u20132410, 2012.\n[13]\tTzortzis G, Likas A, \"The Min-Max k-Means clustering algorithm\". Pattern Recognition 47:2505\u20132516-2014.\n[14]\tEslamnezhad M, Varjani A Y, \"Intrusion detection based on Min-Max K-means clustering.\" In 7th International symposium on telecommunications (IST'2014), pp 804\u2013808-2014.\n[15]\tYuan F, Meng Z. H, Zhang H, X and Dong C. R, \"A new algorithm to get the initial centroids.\" Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pages 26-29, 2004.\n[16]\tXiaoyan Wang, Yanping Bai, \"The global Min-Max k means algorithm\", Wang and Bai SpringerPlus 5:1665, DOI 10.1186/s40064 016 3329 4-2016.\n[17]\tZengyou He, Shengchun Deng \"Improving K-modes Algorithm considering frequencies of attributes values in mode.\" Conference paper in Lecture notes in computer science, December 2005.\n[18]\tG. La Free, \"The Global Terrorism Database: Accomplishments and Challenges\", Perspectives on Terrorism, Vol. 4 (2010).\n[19]\tX. Wang, E. Miller, K. Smarick, W. Ribarsky and R. Chang, \"Investigative Visual Analysis of Global terrorism.\", Proceeding of the 10th Joint Eurographics/ IEEE-VGTC conference on Visualization, Vol. 27 (2008): 919-926.\n[20]\tM. Adnan, M. Rafi, \"Extracting patterns from Global Terrorism Database (GTD) sing co-clustering approach.\" Journal of independent studies and research computing, Volume 13, 2015.\n[21]\tSemeh Ben Salem and Sami Naouali, \"Pattern Recognition Approach in Multidimensional Databases: Application to the Global Terrorism Database\" International Journal of Advanced Computer Science and Applications (IJACSA), 7(8), 2016.\n[22]\tSilke Wagner, Dorothea Wagner, \"Comparing Clusterings-An Overview\", January 12, 2007."]}
format Text
author Semeh Ben Salem
Naouali, Sami
Moetez Sallami
author_facet Semeh Ben Salem
Naouali, Sami
Moetez Sallami
author_sort Semeh Ben Salem
title A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
title_short A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
title_full A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
title_fullStr A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
title_full_unstemmed A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database
title_sort computational cost-effective clustering algorithm in multidimensional space using the manhattan metric: application to the global terrorism database
publisher Zenodo
publishDate 2017
url https://dx.doi.org/10.5281/zenodo.1130685
https://zenodo.org/record/1130685
genre sami
genre_facet sami
op_relation https://dx.doi.org/10.5281/zenodo.1130684
op_rights Open Access
Creative Commons Attribution 4.0
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
op_rightsnorm CC-BY
op_doi https://doi.org/10.5281/zenodo.1130685
https://doi.org/10.5281/zenodo.1130684
_version_ 1766187026284019712
spelling ftdatacite:10.5281/zenodo.1130685 2023-05-15T18:14:16+02:00 A Computational Cost-Effective Clustering Algorithm In Multidimensional Space Using The Manhattan Metric: Application To The Global Terrorism Database Semeh Ben Salem Naouali, Sami Moetez Sallami 2017 https://dx.doi.org/10.5281/zenodo.1130685 https://zenodo.org/record/1130685 en eng Zenodo https://dx.doi.org/10.5281/zenodo.1130684 Open Access Creative Commons Attribution 4.0 https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess CC-BY Pattern recognition partitional clustering K-means clustering Manhattan distance terrorism data analysis. Text Journal article article-journal ScholarlyArticle 2017 ftdatacite https://doi.org/10.5281/zenodo.1130685 https://doi.org/10.5281/zenodo.1130684 2021-11-05T12:55:41Z The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k -means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k -means version. : {"references": ["De Bruin, J. S, Cocx, T. K, Kosters, W. A, Laros, \"Data Mining approaches to criminal career analysis.\" In Proceedings of the 6th International Conference on Data Mining ICDM'06, pp 11-18, 2006.", "T. Abraham and O. de Vel, \"Investigating profiling with computer forensic log data and associations rules.\" Proceedings of the IEEE International Conference on Data Mining (ICDM'06), pp 11-18, 2006.", "Jiawei Han M. K, \"Data Mining concepts and techniques.\" Morgan Kaufmann Publishers, An Imprint of Elsevier, 2006.", "Huang Z, \"Extension to the k-means algorithm for clustering large datasets with categorical values\", Data Mining and Knowledge Discovery, (2):283-304, 1998.", "Amir Ahmad, Lipika Dey, \"A k-means clustering algorithm for mixed numeric and categorical data.\" Data and Knowledge Engineering 63, pp 503-527, 2007.", "V. Ganti, J. E Gekhre, R. Ramakrishnan, \"CACTUS clustering categorical data using summaries\", Proceedings of the 5th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 1999, pp 73-83.", "T. Zhang, R. Ramakrishnan, M. Livny, \"BIRCH: an efficient data clustering method for very large databases.\" SIGMOD Conference, 1996, pp 130-114.", "Dong Kuan Xu, Yingjie Tian, \"A Comprehensive Survey of Clustering Algorithms\", Ann. Data. Sci. , Springer-Verlag Berlin Heidelberg 2015, DOI 10.1007/s40745-015-0040-1", "Celebi M E, Kingravi H A Vela P A, \"A comparative study of efficient initialization methods for the k-means clustering algorithm\". Expert Systems with Applications 40:200\u2013210, 2013.\n[10]\tCelebi M E, Kingravi H, \"Deterministic initialization of the K-means algorithm using hierarchical clustering\", International Journal of Pattern Recognition and Artificial Intelligence 26(7):1250018, 2012.\n[11]\tCelebi M E, Kingravi H, \"Linear, deterministic, and order-invariant initialization methods for the K-means clustering algorithm.'' Celebi M E (ed) Partitional clustering algorithms. Springer, Berlin, pp 79\u201398, 2014.\n[12]\tKalogeratos A, Likas A, \"Dip-means: an incremental clustering method for estimating the number of clusters.\" In: Advances in neural information processing systems (NIPS), pp 2402\u20132410, 2012.\n[13]\tTzortzis G, Likas A, \"The Min-Max k-Means clustering algorithm\". Pattern Recognition 47:2505\u20132516-2014.\n[14]\tEslamnezhad M, Varjani A Y, \"Intrusion detection based on Min-Max K-means clustering.\" In 7th International symposium on telecommunications (IST'2014), pp 804\u2013808-2014.\n[15]\tYuan F, Meng Z. H, Zhang H, X and Dong C. R, \"A new algorithm to get the initial centroids.\" Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pages 26-29, 2004.\n[16]\tXiaoyan Wang, Yanping Bai, \"The global Min-Max k means algorithm\", Wang and Bai SpringerPlus 5:1665, DOI 10.1186/s40064 016 3329 4-2016.\n[17]\tZengyou He, Shengchun Deng \"Improving K-modes Algorithm considering frequencies of attributes values in mode.\" Conference paper in Lecture notes in computer science, December 2005.\n[18]\tG. La Free, \"The Global Terrorism Database: Accomplishments and Challenges\", Perspectives on Terrorism, Vol. 4 (2010).\n[19]\tX. Wang, E. Miller, K. Smarick, W. Ribarsky and R. Chang, \"Investigative Visual Analysis of Global terrorism.\", Proceeding of the 10th Joint Eurographics/ IEEE-VGTC conference on Visualization, Vol. 27 (2008): 919-926.\n[20]\tM. Adnan, M. Rafi, \"Extracting patterns from Global Terrorism Database (GTD) sing co-clustering approach.\" Journal of independent studies and research computing, Volume 13, 2015.\n[21]\tSemeh Ben Salem and Sami Naouali, \"Pattern Recognition Approach in Multidimensional Databases: Application to the Global Terrorism Database\" International Journal of Advanced Computer Science and Applications (IJACSA), 7(8), 2016.\n[22]\tSilke Wagner, Dorothea Wagner, \"Comparing Clusterings-An Overview\", January 12, 2007."]} Text sami DataCite Metadata Store (German National Library of Science and Technology)