Articles | Open Access | DOI: https://doi.org/10.37547/supsci-ojp-06-03-26

REQUIREMENTS FOR PARALLEL CORPORA USED IN TRAINING MACHINE TRANSLATION MODELS

Gulshoda Shamsiyeva ,

Abstract

The rapid development of artificial intelligence, natural language processing, and multilingual machine translation technologies has significantly increased the importance of parallel corpora in translation modeling and translation education. Modern intelligent translation systems increasingly rely on semantically adapted multilingual corpora for contextual interpretation, semantic image learning, and adaptive translation generation. This study examines the linguistic, semantic, technological, and pedagogical requirements for parallel corpora in teaching translation modeling in a multilingual computing environment.

Keywords

parallel corpus, modeling, computational linguistics, machine translation, multilingual corpus, semantic matching, intelligent translation systems, NLP.

References

Costa-Jussa M. R., Cross J., Çelebi O., Elbayad M., Heafield K. et al. No Language Left Behind: Scaling Human-Centered Machine Translation. – Meta AI Research, 2022. – 40 p.

Fan A., Bhosale S., Schwenk H., Ma Z. et al. Beyond English-Centric Multilingual Machine Translation // Journal of Machine Learning Research. – 2021. – Vol. 22. – P. 1–48.

Koehn, P. Neural Machine Translation. – Cambridge: Cambridge University Press, 2020. – 393 p.

Liu Y., Gu J., Goyal N., Li X., Edunov S. et al. Multilingual Denoising Pre-training for Neural Machine Translation // Transactions of the Association for Computational Linguistics. – 2020. – Vol. 8. – P. 726–742.

Rei R., Stewart C., Farinha A. C., Lavie A. COMET: A Neural Framework for MT Evaluation // Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). – 2020. – P. 2685–2702.

N.Abdurakhmonova, Shamsiyeva G. “Context-based multilingual translation technology: on the example of the paratranslator platform”. IEEE_UBMK-2025 International Conference on Computer Science and Engineering, 1800-1804 pp, 2025

Tiedemann J. Parallel Data, Tools and Interfaces in OPUS // Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC). – 2012. – P. 2214–2218.

Tiedemann J., Thottingal S. OPUS-MT – Building Open Translation Services for the World // Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). – 2020. – P. 479–480.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J. et al. Attention Is All You Need // Advances in Neural Information Processing Systems. – 2017. – Vol. 30. – P. 5998–6008.

N. A. Zaynobiddin qizi and S. G. Asliddin qizi, “Theoretical Foundations of Corpus-based Uzbek-English Machine Translation,” 2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE), Novosibirsk, Russian Federation, 2024, pp. 1650-1653, doi: 10.1109/PIERE62470.2024.10805010.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Shamsiyeva, G. . (2026). REQUIREMENTS FOR PARALLEL CORPORA USED IN TRAINING MACHINE TRANSLATION MODELS. Oriental Journal of Philology, 6(03). https://doi.org/10.37547/supsci-ojp-06-03-26