Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

STUDY OF THE ACCURACY OF CLUSTERING ALGORITHMS FOR TEXTS WRITTEN IN EUROPEAN LANGUAGES

Khairov M. R.  (Supplementary education teacher Peoples' Friendship University of Russia named after Patrice Lumumba (Moscow) )

Sabirova D. I.  (Laboratory assistant MIREA - Russian Technological University (Moscow) )

Novikova D. S.  (Senior teacher of supplementary education Peoples' Friendship University of Russia named after Patrice Lumumba (Moscow) )

this paper is devoted to investigate the problem of evaluating the accuracy of text clustering. To conduct the research, an expertly labeled dataset of 1800 texts was created, divided into three topics: IT innovations, education and politics, as well as by text size. The research included the steps of text processing, building vector models and applying different clustering algorithms such as K-means, Affinity Propagation and DBScan. The results showed that K-means and Affinity Propagation algorithms achieved good results in text clustering accuracy (82% and 85%, respectively), while DBScan showed low accuracy (52%) due to data features. In addition, K-means outperformed the other algorithms in terms of clustering completeness, showing 78%.

Keywords:text clustering, text vector models, TF-IDF, K-means, Affinity Propagation, DBScan, clustering accuracy

 

Read the full article …



Citation link:
Khairov M. R., Sabirova D. I., Novikova D. S. STUDY OF THE ACCURACY OF CLUSTERING ALGORITHMS FOR TEXTS WRITTEN IN EUROPEAN LANGUAGES // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2024. -№07/2. -С. 190-195 DOI 10.37882/2223-2966.2024.7-2.37
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"