Nueral Machine Translation for Native Peruvian Languages

September, 2025

Nueral Machine Translation for Native Peruvian Languages

Bridging Worlds: The Story of the AiMara Translator Project

The Neural Machine Translation for Native Peruvian Languages project was born from a deeply personal mission. It was initiated by member of AiMara Lab, both systems and computer engineers deeply rooted in the Aymara culture, whose native language is Aymara. Their vision was to build a bridge between their ancestral heritage and the forefront of technological evolution.

What began as a personal initiative has since grown into a collaborative movement, attracting more individuals passionate about a shared goal: translating the Aymara language for a global audience. Our mission is twofold: to ensure the Aymara community is an active participant in the digital age, and to develop practical solutions that directly benefit native communities.

From Theory to Practice: Our Technology

Our progress is built on a solid foundation of cutting-edge research. The current translator prototype was developed using principles from the seminal papers that power modern AI:

- "Attention is All you Need" - Vaswani, et al. (2017)
- "Effective Approaches to Attention-based Neural Machine Translation" - Luong, et al. (2015)

This approach has yielded promising results, validated by standard industry metrics such as BLEU, ROUGE, and METEOR. The prototype is a tangible result of our dedicated work.

Try the AiMara Translator Prototype : https://translate.aimaralab.com/

SITA: The Heart of Our Corpus Creation

To power our models, we developed a comprehensive system for corpus creation and validation, named SITA. This integrated platform allows us to upload sentences in Spanish (currently with a Peruvian context), which are then presented to human translators. Our registered translators can provide accurate Aymara translations and, crucially, record voice audio for each phrase. This audio data is a vital component for our future Speech-to-Text projects.

Explore the SITA Platform: https://sita.aimaralab.com/

Fueling the Future: The Peruvian Data Initiative

A robust translation model requires a vast and diverse dataset. To that end, AiMara Lab is actively recovering and structuring data from various Peruvian contexts, including news, health, education, and legal sectors. This initiative ensures we have a rich, relevant, and extensive data foundation to be translated into Aymara, making our models more accurate and useful.

View Our Data Initiative Progress: https://datos.aimaralab.com/

Building the Future, Together

This project is more than just code; it's a commitment to cultural preservation and digital inclusion. We are continuously working to improve our models, expand our datasets, and build new tools for the Aymara community and the world.