Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

STUDY OF VECTORIZATION METHODS OF SCIENTIFIC TEXTS FOR MULTI-TASK CLASSIFICATION BASED ON VARIOUS DATA VOLUMES

Potapova Ksenia Aleksandrovna  (Senior Lecturer, MIREA – Russian Technological University)

Isaeva Irina Andreevna  (Senior Lecturer, MIREA – Russian Technological University)

Gabrielyan Gaik Ashotovich  (Senior Lecturer, MIREA – Russian Technological University)

The study analyzed different vectorization methods in task classification. Two statistical methods were selected for vectorization of scientific articles: bag of words and TF-IDF, as well as one neural network model word2vec. A comparative analysis of different clustering models was conducted, after which two models were selected for the experiment: a modification of logistic regression and a random forest. To assess the impact of input data volume on classification quality, three scenarios were used: using only titles, using titles and abstracts, and using titles, abstracts, and article texts. Each scenario was tested on all vectorization methods and selected classification models, which allowed us to identify the relationship between data completeness, vectorization type, and the resulting classification quality metrics.

Keywords:vectorization, scientific articles, machine learning, classification, semantic analysis

 

Read the full article …



Citation link:
Potapova K. A., Isaeva I. A., Gabrielyan G. A. STUDY OF VECTORIZATION METHODS OF SCIENTIFIC TEXTS FOR MULTI-TASK CLASSIFICATION BASED ON VARIOUS DATA VOLUMES // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2025. -№05/2. -С. 99-105 DOI 10.37882/2223-2966.2025.05-2.18
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"