Project to develop verb conjugators and speech synthesis technologies for Indigenous languages

 

Kawennón:nis is a verb conjugator developed by the NRC and collaborators at the Onkwawenna Kentyohkwa Language School as a teaching tool. It displays verb conjugations for the Western dialect of Kanyen'kéha (the Mohawk language) as spoken in Six Nations. The team then worked with the Kahnawà:ke Mohawk community to extend the technology to the Eastern dialect. The team has also begun building verb conjugators for Algonquin and Michif.

Most Indigenous languages in Canada, including Mohawk, are polysynthetic, meaning words are made by combining 7-10 morphemes. This includes verb formation and makes verb conjugations one of the most difficult aspects of these languages to learn. It also makes representing these conjugations in a printed textbook almost impossible, as even the most common verbs can take on thousands of possible forms. Fortunately, verb conjugation software like Kawennón:nis can generate all these conjugations, making it a valuable resource for language learners. Kawennón:nis is now being used in classrooms both in Six Nations and in Kahnawà:ke.

Just as printing a text that covers all possible conjugations for Mohawk verbs is not feasible, neither is recording audio for them. The team worked with collaborators in both Six Nations and Kahnawà:ke to create a preliminary speech synthesis system that speaks Mohawk conjugations out loud. A general-purpose speech synthesis system capable of producing audio for whole sentences is also being researched and developed.

Collaborators

Objectives

  • Build Kawennón:nis, a Mohawk verb conjugator for the Western dialect, using WordWeaver grammar building software
  • Extend the Mohawk verb conjugator to the Eastern dialect, using WordWeaver grammar building software
  • Extend verb conjugation tools to other Indigenous languages
  • Develop a simpler grammar building software: Gramble
  • Build a speech synthesis technology to help learners master verb conjugation pronunciation

Deliverables

  • Technology transfer of the Kawennón:nis verb conjugator (Western dialect) to the Onkwawenna Kentyohkwa Mohawk-language immersion school, which has made it available online
  • Technology transfer of a version of the Kawennón:nis verb conjugator adapted to the Eastern dialect to the Kanien'kehá:ka Onkwawén:na Raotitióhkwa Language and Cultural Center
  • WordWeaver: Source code and graphic user interface to create verb conjugators for Iroquoian languages, made available on the NRC's Github account with an open-source licence
  • Gramble: Simple grammar building software
  • Speech synthesis technology

Activities

Verb conjugator for Western dialect

The focus of the NRC's collaboration with Onkwawenna Kentyohkwa is Kawennón:nis, meaning 'wordmaker' in Kanyen'kéha. Kawennón:nis is a verb conjugator meant to assist learners and educators at the school as well as students of the language, wherever they might be. The idea for the tool was suggested by Owennatekha.

The creation and extension of this tool involves a number of researchers at the NRC, Owennatekha, and two other educators from Onkwawenna Kentyohkwa. The Onkwawenna Kentyohkwa experts supplied the linguistic knowledge incorporated in Kawennón:nis, and collaboratively designed the tool with NRC developers to meet the needs of adult immersion learners.

Kawennón:nis's user interface is closely linked to the school's curriculum and was designed by students, educators, and NRC researchers. Kawennón:nis is hosted online by the school. It is available on web and mobile devices, online or offline, and the interface is available in both English and Mohawk.

Verb conjugator for Eastern dialect

The team then worked with the Kahnawà:ke Mohawk community to extend the technology to the Eastern dialect. This includes working with language expert Akwiratékha’ Martin to adapt the existing language model to the Kahnawà:ke dialect and transferring the resulting conjugator to be held and managed by the Kanien’kehá:ka Onkwawén:na Raotitióhkwa Language and Cultural Center (KORLCC).

WordWeaver and Gramble: Underlying technologies

The language model that powers Kawennón:nis is called WordWeaver and is the first of its kind for any Iroquoian language. It is based on a finite-state transducer (FST) incorporating manually encoded rules; this language model interfaces with web and mobile front-ends. It was used to build Kawennón:nis for the Western dialect and also powers the extension to the Eatern dialect. WordWeaver is a language-independent technology and has been released on the NRC's Github account with an open-source licence.

Despite these successes, WordWeaver and other software frameworks for building grammars are designed for computational linguists. So that Indigenous communities can create their own software for grammar, the NRC team has been working on a new software framework called Gramble that radically simplifies the process. Gramble enables people familiar with the grammar of a language to enter its rules in a spreadsheet. The team hopes to release Gramble in 2021-22.

Speech synthesis

If printed in a book, the number of possible Mohawk verb conjugations could fill a 40 story building. These nearly infinite possibilities make it hard for learners to imagine what each conjugation sounds like. The team worked with expert speakers Akwiratekha Martin and Rohahí:yo Brant from Kahnawà:ke and Six Nations to create a preliminary speech synthesis technology that reads the Kanyen’kéha conjugations out loud.

First they recorded high quality pronunciations of sample verb conjugations. They then developed a speech synthesis system which is able to rearrange the sounds in those words to create new words. Using this method they were able to record just 852 conjugations that synthesized the first 122,966 conjugations in the verb conjugator. The team is now developing a general-purpose speech synthesis system capable of producing audio for whole sentences.

The question is, does the system sound like a fluent speaker of the language? The synthesized speech has been evaluated by expert speakers and feedback from Mohawk educators is encouraging. If the team succeeds in producing acceptable synthetic Kanyen’kéha speech, it will explore the application of this technology to other Indigenous languages.

Project team

Anna Kazantseva

Anna Kazantseva, PhD

Computational linguistics of literature (novels and stories); modelling discourse structure of long informal documents; computational linguistics of Iroquoian languages.

 Aidan Pine

Aidan Pine

Development of software for supporting Indigenous languages; he has developed tools in collaboration with Gitksan & Heiltsuk communities.

Contact us

Owennatekha Brian Maracle
Founder and Head, Onkwawenna Kentyohkwa
Telephone: 519-445-1250
Email: onkwawenna@gmail.com

Akwiratekha Martin
Expert speaker and language activist, Kahnawà:ke Mohawk Community
Email: tekhaluvsyou@hotmail.com

Roland Kuhn,
Project Leader, Indigenous Languages Technology Project, NRC
Telephone: 613-993-0821
Email: Roland.Kuhn@nrc-cnrc.gc.ca
LinkedIn: Roland Kuhn

Related links

Publications