The Kyrgyz Speech Synthesis Model Kani TTS 2 Ranked Top on the Hugging Face Platform

Арестова Татьяна Society
VK X OK WhatsApp Telegram

Kyrgyz IT specialists have once again attracted attention on the international stage with a new development. This became known thanks to information provided by the High Technology Park (HTP) of Kyrgyzstan.

The NineNineSix team presented an updated version of their speech synthesis model called Kani TTS2, which has already secured a place among the top TTS models on Hugging Face — the world's largest platform for artificial intelligence models.

Kani TTS 2 is a continuation of the team's previous developments and demonstrates significant improvements: the model can now generate up to 40 seconds of continuous speech in a single run, which is more than twice the result of the first version.

The HTP emphasized that for an open model from Kyrgyzstan to make it into the top three TTS on Hugging Face is an exceptional and important achievement.

About the NineNineSix Team

NineNineSix is a team of developers from Kyrgyzstan working in the field of artificial intelligence and known for their innovations in language technologies.

Previously, they introduced the first version of Kani TTS, as well as developed a voice speaker and the AI assistant AkylAi, which became the first artificial intelligence to speak in the Kyrgyz language.

Voice for Low-Resource Languages

Major companies in the AI field typically focus on English and other widely spoken languages, leaving low-resource languages without proper attention. NineNineSix chose a different approach.

Kani TTS 2 supports English, Spanish, and Kyrgyz languages, and its architecture allows for adapting the model for various languages, accents, and dialects.

A distinctive feature of the project is the publication of the complete pre-training code, enabling any country or research team to create their own voice model based on Kani TTS 2.

“Kani TTS 2 is an evolution of our first version: we improved the stability of speech generation and expanded the model's capabilities to work with longer segments. We strive to create compact and open models that are easier to adapt to various languages and accents, including those with limited representation. We want to demonstrate that world-class technologies can be developed in Kyrgyzstan, which is why we opened not only the model weights but also the entire pre-training code so that any team can train TTS for their language,” noted Nursultan Bakashov, co-founder of nineninesix.ai.

Kani TTS 2 includes the following key improvements:

* The ability to stably generate up to 40 seconds of speech in one pass;

* Support for zero-shot voice cloning technology — cloning a voice based on a short audio fragment;

* Complete openness of the architecture and training code;

* Entry into the top 3 TTS models on Hugging Face.

According to the HTP, the model includes about 400 million parameters and was pre-trained on approximately 10,000 hours of speech data. It can run on a GPU with about 3 GB of video memory, making it accessible for use both locally and on servers.

“Kani TTS 2 is not just another AI model. It is a clear demonstration that specialists from Kyrgyzstan can develop world-class technologies and compete in the global artificial intelligence market. NineNineSix shows that Kyrgyzstan can be not only a consumer but also a creator of advanced AI solutions,” the HTP noted.
VK X OK WhatsApp Telegram

Read also: