Khairov M. R. (Supplementary education teacher
Peoples' Friendship University of Russia
named after Patrice Lumumba (Moscow)
)
Sabirova D. I. (Laboratory assistant
MIREA - Russian Technological University (Moscow)
)
Novikova D. S. (Senior teacher of supplementary education
Peoples' Friendship University of Russia
named after Patrice Lumumba (Moscow)
)
|
this paper is devoted to investigate the problem of evaluating the accuracy of text clustering. To conduct the research, an expertly labeled dataset of 1800 texts was created, divided into three topics: IT innovations, education and politics, as well as by text size.
The research included the steps of text processing, building vector models and applying different clustering algorithms such as K-means, Affinity Propagation and DBScan.
The results showed that K-means and Affinity Propagation algorithms achieved good results in text clustering accuracy (82% and 85%, respectively), while DBScan showed low accuracy (52%) due to data features. In addition, K-means outperformed the other algorithms in terms of clustering completeness, showing 78%.
Keywords:text clustering, text vector models, TF-IDF, K-means, Affinity Propagation, DBScan, clustering accuracy
|
|
|
Read the full article …
|
Citation link: Khairov M. R., Sabirova D. I., Novikova D. S. STUDY OF THE ACCURACY OF CLUSTERING ALGORITHMS FOR TEXTS WRITTEN IN EUROPEAN LANGUAGES // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2024. -№07/2. -С. 190-195 DOI 10.37882/2223-2966.2024.7-2.37 |
|
|