Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

Vocabulary learning via optimal transport for neural machine translation

Zhong Ruiyu   (Рostgraduate, National Research University of Electronic Technology)

The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether one can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of the role of vocabulary from the perspective of information theory. Motivated by this, we formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VNMT, a simple and efficient solution without trial training. Empirical results show that VNMT outperforms widely-used vocabularies in diverse scenarios, including WMT-14 English-German and TED multilingual translation. For example, VNMT achieves almost 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation.

Keywords:natural language processing, machine translation, vocabulary, optimal transport, machine learning, multilingual translation.

 

Read the full article …



Citation link:
Zhong R. Vocabulary learning via optimal transport for neural machine translation // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2022. -№04. -С. 159-165 DOI 10.37882/2223-2966.2022.04.39
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"