Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

SELF-LEARNING FORMATION OF DATA QUALITY METRICS FOR UNEXPLORED STREAMS

Ulanov Kirill Anatolyevich  (Postgraduate Student, Department of Information Systems Moscow State Technological University "Stankin" )

The article is devoted to the problem of automatic quality control of "dark" data streams — Kafka topics, for which there is no reference markup and a pre‐known scheme. The aim of the work is to develop a self—learning method for generating metrics for the quality of streaming data, capable of evaluating the reliability of unexplored events in real time without manual rules. A streaming algorithm is proposed in which a lightweight online encoder extracts features, a Boolean augmentation mask creates positive and negative examples, and a loss rank function is trained on the principle of self-learning ranking. On the NYC Taxi open set, the method was ahead of the rule-based tests, Isolation Forest and Deep SVDD: the P1000 increased to 0.74, and the error detection delay decreased to 32 seconds when loading 0.55 vCPU. The findings confirm that self-learning ranking is an effective and resource-saving framework for end-to-end data quality control in streaming systems.

Keywords:streaming data processing, data quality control, self-supervised learning, rank learning, Apache Kafka, unexplored streams.

 

Read the full article …



Citation link:
Ulanov K. A. SELF-LEARNING FORMATION OF DATA QUALITY METRICS FOR UNEXPLORED STREAMS // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2025. -№06. -С. 241-245 DOI 10.37882/2223-2966.2025.06.46
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"