Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature
Text mining and topic analysis algorithms which group textual contents in the most efficient way, are becoming increasingly useful to summarise the main information contained in large data corpus of complex scientific fields. Using the literature about reindeer pastoralism as a case study, this meth...
Published in: | Italian Journal of Animal Science |
---|---|
Main Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Taylor and Francis Ltd.
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/11577/3540862 https://doi.org/10.1080/1828051x.2024.2398168 |
_version_ | 1821693253055938560 |
---|---|
author | Contiero, Barbara Holand, Øystein Cozzi, Giulio |
author2 | Contiero, Barbara Holand, Øystein Cozzi, Giulio |
author_facet | Contiero, Barbara Holand, Øystein Cozzi, Giulio |
author_sort | Contiero, Barbara |
collection | Padua Research Archive (IRIS - Università degli Studi di Padova) |
container_issue | 1 |
container_start_page | 1348 |
container_title | Italian Journal of Animal Science |
container_volume | 23 |
description | Text mining and topic analysis algorithms which group textual contents in the most efficient way, are becoming increasingly useful to summarise the main information contained in large data corpus of complex scientific fields. Using the literature about reindeer pastoralism as a case study, this methodological investigation addressed the issue related to the identification of the suitable number of topics that provide the best in-depth interpretation of a large data corpus. Two-thousand eight hundred and seventy-five documents extracted from Scopus® regarding the scientific literature of reindeer pastoralism were used. Four simulations with 8, 10, 12, and 20 topics were carried out to define the optimal number of topics that best explained the issues related to reindeer husbandry. The results showed that a reasonable trade-off between the number of articles and the number of topics, based on the reduction of the variance explained within the group, leads to an optimal choice in the search for the most meaningful simulation. The adoption of a too large number of topics, with the excessive fragmentation of the data corpus into small aggregations of documents, encourages the emergence of topics without any technical or practical meaning, solely as a result of the unsupervised iterative process.HIGHLIGHTS Text mining for insight vast and complex scientific fields: a case study on reindeer pastoralism. Optimising topic identification to strike a balance between the size of the articles corpus and the number of topics and achieve the most insightful results. Too many topics can lead to fragmentation and irrelevant results, while too few may oversimplify the complexity of the dataset. |
format | Article in Journal/Newspaper |
genre | reindeer husbandry |
genre_facet | reindeer husbandry |
id | ftunivpadovairis:oai:www.research.unipd.it:11577/3540862 |
institution | Open Polar |
language | English |
op_collection_id | ftunivpadovairis |
op_container_end_page | 1357 |
op_doi | https://doi.org/10.1080/1828051x.2024.2398168 |
op_relation | info:eu-repo/semantics/altIdentifier/wos/WOS:001304463800001 volume:23 issue:1 firstpage:1348 lastpage:1357 numberofpages:10 journal:ITALIAN JOURNAL OF ANIMAL SCIENCE https://hdl.handle.net/11577/3540862 doi:10.1080/1828051x.2024.2398168 |
publishDate | 2024 |
publisher | Taylor and Francis Ltd. |
record_format | openpolar |
spelling | ftunivpadovairis:oai:www.research.unipd.it:11577/3540862 2025-01-17T00:28:48+00:00 Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature Contiero, Barbara Holand, Øystein Cozzi, Giulio Contiero, Barbara Holand, Øystein Cozzi, Giulio 2024 https://hdl.handle.net/11577/3540862 https://doi.org/10.1080/1828051x.2024.2398168 eng eng Taylor and Francis Ltd. info:eu-repo/semantics/altIdentifier/wos/WOS:001304463800001 volume:23 issue:1 firstpage:1348 lastpage:1357 numberofpages:10 journal:ITALIAN JOURNAL OF ANIMAL SCIENCE https://hdl.handle.net/11577/3540862 doi:10.1080/1828051x.2024.2398168 Number of topic reindeer pastoralism simulation text mining topic analysis info:eu-repo/semantics/article 2024 ftunivpadovairis https://doi.org/10.1080/1828051x.2024.2398168 2024-12-20T00:56:57Z Text mining and topic analysis algorithms which group textual contents in the most efficient way, are becoming increasingly useful to summarise the main information contained in large data corpus of complex scientific fields. Using the literature about reindeer pastoralism as a case study, this methodological investigation addressed the issue related to the identification of the suitable number of topics that provide the best in-depth interpretation of a large data corpus. Two-thousand eight hundred and seventy-five documents extracted from Scopus® regarding the scientific literature of reindeer pastoralism were used. Four simulations with 8, 10, 12, and 20 topics were carried out to define the optimal number of topics that best explained the issues related to reindeer husbandry. The results showed that a reasonable trade-off between the number of articles and the number of topics, based on the reduction of the variance explained within the group, leads to an optimal choice in the search for the most meaningful simulation. The adoption of a too large number of topics, with the excessive fragmentation of the data corpus into small aggregations of documents, encourages the emergence of topics without any technical or practical meaning, solely as a result of the unsupervised iterative process.HIGHLIGHTS Text mining for insight vast and complex scientific fields: a case study on reindeer pastoralism. Optimising topic identification to strike a balance between the size of the articles corpus and the number of topics and achieve the most insightful results. Too many topics can lead to fragmentation and irrelevant results, while too few may oversimplify the complexity of the dataset. Article in Journal/Newspaper reindeer husbandry Padua Research Archive (IRIS - Università degli Studi di Padova) Italian Journal of Animal Science 23 1 1348 1357 |
spellingShingle | Number of topic reindeer pastoralism simulation text mining topic analysis Contiero, Barbara Holand, Øystein Cozzi, Giulio Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title | Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title_full | Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title_fullStr | Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title_full_unstemmed | Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title_short | Identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
title_sort | identifying the optimal number of topics in text mining: a case study on reindeer pastoralism literature |
topic | Number of topic reindeer pastoralism simulation text mining topic analysis |
topic_facet | Number of topic reindeer pastoralism simulation text mining topic analysis |
url | https://hdl.handle.net/11577/3540862 https://doi.org/10.1080/1828051x.2024.2398168 |