Potapova Ksenia Aleksandrovna (Senior Lecturer, MIREA – Russian Technological University)
Isaeva Irina Andreevna (Senior Lecturer, MIREA – Russian Technological University)
Gabrielyan Gaik Ashotovich (Senior Lecturer, MIREA – Russian Technological University)
|
The study analyzed different vectorization methods in task classification. Two statistical methods were selected for vectorization of scientific articles: bag of words and TF-IDF, as well as one neural network model word2vec. A comparative analysis of different clustering models was conducted, after which two models were selected for the experiment: a modification of logistic regression and a random forest. To assess the impact of input data volume on classification quality, three scenarios were used: using only titles, using titles and abstracts, and using titles, abstracts, and article texts. Each scenario was tested on all vectorization methods and selected classification models, which allowed us to identify the relationship between data completeness, vectorization type, and the resulting classification quality metrics.
Keywords:vectorization, scientific articles, machine learning, classification, semantic analysis
|
|
|
Read the full article …
|
Citation link: Potapova K. A., Isaeva I. A., Gabrielyan G. A. STUDY OF VECTORIZATION METHODS OF SCIENTIFIC TEXTS FOR MULTI-TASK CLASSIFICATION BASED ON VARIOUS DATA VOLUMES // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2025. -№05/2. -С. 99-105 DOI 10.37882/2223-2966.2025.05-2.18 |
|
|