Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

The method for detecting and analyzing anomalous HTTP traffic using natural language models and vector representation of HTTP requests

Liashkov Mikhail Andreevich  (graduate student, Derzhavin Tambov State University)

Pchelintsev Sergey Yurevich  (graduate student, Derzhavin Tambov State University)

Kovaleva Olga Alexandrovna  (D.Sc., associate professor, Derzhavin Tambov State University; Tambov State Technical University)

The paper proposes to use modern unsupervised learning approaches to automatically construct a representation of HTTP requests, and then use it to classify anomalies in traffic. The solution is based on techniques used in natural language processing, such as Doc2Vec, which can potentially achieve a deep understanding of HTTP messages and therefore improve the performance of an intrusion detection system. An important property is the interpretability of such a model. To test how the solution would work in the real world, a RoBERTa language model adapted from natural language processing was trained on normal network traffic, and its ability to detect anomalous traffic that the model had not seen before was measured. The proposed method is evaluated on publicly available data from CSIC 2010, CSE-CIC-IDS 2018. According to the results obtained, training the model on exceptionally normal network traffic makes it possible to detect anomalous HTTP requests well, this approach also does not require expert markup, and vector representations provide interpretability, the system is able to indicate specific places in a particular HTTP request that it considers anomalous. In most cases, it is easy to remove normal network traffic and it is relatively difficult to remove a sufficient amount of malicious traffic, since systems are not under attack most of the time and either expert time or a configured external system is required to isolate malicious traffic from the entire flow. The paper provides an explanation of the results based on clusters that occur in the space of vectorized queries and a simple logistic regression classifier. A good separation after t-SNE indicates an easy separation of http requests on the specified datasets, and the vector representation of requests makes it possible to receive requests similar in semantics from history.

Keywords:Anomaly detection, http traffic, language models, model training.

 

Read the full article …



Citation link:
Liashkov M. A., Pchelintsev S. Y., Kovaleva O. A. The method for detecting and analyzing anomalous HTTP traffic using natural language models and vector representation of HTTP requests // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2022. -№04. -С. 109-117 DOI 10.37882/2223-2966.2022.04.23
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"