Armenian Language Speech-to-Text Data Collection challenge
The College of Science and Engineering (CSE) at AUA, in collaboration with NVIDIA, has launched a project to enhance the Armenian speech-to-text database. This open-source database, part of the Mozilla Common Voice project, periodically releases collected datasets under a free-to-use license (Creative Commons CC0), with the next release scheduled for mid-December.
Currently, the database contains approximately 5 hours of Armenian voice recordings, compared to 154 hours for Georgian and 3,400 hours for English. The goal of the project is to expand the Armenian dataset to 300 hours, enabling Armenian scientists and students to effectively train their models.
CSE at AUA is hosting an offline event at AUA for all interested individuals to attend and inquire further. Following this, they will conduct recordings in their labs with participants who desire more information. Details can be found here.
Additionally, you can contribute to this event without physically attending AUA. By dedicating just 5 minutes a day to this cause, you can participate remotely. On December 4, the top contributors will be awarded prizes.
During the challenge, the participants are expected to complete short Armenian voice recordings and validating tasks online. For more details about the challenge and participation methods, please refer to the provided guidelines:
Register: Join Mozilla Common Voice. Even if you prefer not to register, you can still participate in the project anonymously.
Contribute: Record and Validate ARMENIAN speech ONLINE
Submit: Share your dashboard statistics by December 4, 8 a.m.
Win: Get a chance to win NVIDIA gifts and attend the award event
Detailed information on how the tasks should be completed can be found on the webpage.
You may also find a video tutorial explaining the challenge steps here.
In case there are further questions you want to address, join the event on December 1, at 3 p.m., at Lab 003 at AUA, where you will be able to learn about online challenge from representatives, address questions and do real-time practice on tasks.
The survival of the Armenian language depends on its inclusion in modern technologies. If the language is overlooked by technological advancements, it risks becoming obsolete. Your support in preserving and enhancing the Armenian language through this technological initiative is greatly appreciated!

.jpg)