Canadian Indigenous languages technology project

 

Status: Active

Overview

We are working on speech- and text-based technologies that aim to assist the stabilization, revitalization and reclamation of Indigenous languages by supporting Indigenous language educators and students, promoting the accessibility of audio recordings, and supporting Indigenous language translators, transcribers and other language professionals.

  • Language-independent technology (such as software) will be released to communities as open-source software.
  • We will be working under the direction and advice of an Advisory committee, and in close collaboration and partnership with Indigenous community organizations and Indigenous communities across Canada.
  • All research done within this project will be compliant with the Tri-Council Research Ethics Policy.
  • Budget 2017 invested $89.9M over three years to support Indigenous languages and cultures. We were granted $6M of this funding.
  • This project is managed by the NRC's Digital Technologies Research Centre.

Technologies

Speech-based technologies

The context

  • There are thousands of hours of recordings of Indigenous languages from across the country.
  • The recordings can be difficult for Indigenous communities to access and make use of because they are not always fully transcribed, and sometimes are missing metadata (information about what languages are being spoken, who is speaking, etc.).

Our aim

  • To create software that will automatically segment and label audio files while they're being recorded (or shortly afterwards).
  • To build and test audio-indexation software that makes it possible to search through existing recordings, including recordings made decades ago, to find key words or phrases.
Text-based technologies

The context

  • The complexity of words in Indigenous languages – in which single, long words made up of many small pieces know as morphemes, can often express what other languages express with entire clauses – poses difficulties for software applications (including both educational and professional software) that lack language-specific word-handling capabilities.
  • Teaching how to form words is a central concern in Indigenous language education.
  • Word complexity, and, in some languages, the complexity of the writing systems, mean that writing in accordance with official community standards is difficult for many learners.

Our aim

  • To design, in collaboration with instructors, educational tools that support exploratory learning of word formation.
  • To develop tools for spell-checking and grammar-checking, for integration with desktop and mobile applications, to help language users at all levels to follow their community's writing standards.

Languages

We are taking a "first deep, then broad" approach. Each software tool we build will initially be specialized to one or two Indigenous languages in Canada, but built in a way that allows customization for additional languages.

We are currently working with:

  • Kanyen'kéha (Mohawk)
  • Inuktitut
  • Cree

Through thoughtful design, and subsequent testing, we will attempt to ensure that the tools we develop in this way will be adaptable to many different languages after this initial development period.

Collaborations

We are collaborating formally and informally with:

7000 Languages

Website: 7000 Languages

Project description: Initiative for Creating Online Indigenous Language Courses (COILC initiative)

The NRC has partnered with the experts at 7000 Languages, a non-profit, non-Indigenous organization based in the United States that creates courses for endangered languages around the world. The NRC will fund selected community teams who wish to create online courses for their languages. Find out more about COILC.

Alberta Language Technology Lab, University of Alberta

Website: Alberta Language Technology Lab, University of Alberta

Project description: Since 2013, the Alberta Language Technology Lab (ALTLab) at the University of Alberta, headed by Dr. Antti Arppe, has been combining research on language structure with the creation of computational tools for Indigenous languages, starting with Plains Cree. The lab has been building on earlier work by its Norwegian partners on Saami and other threatened Uralic languages of Northern Eurasia which resulted in the Giella linguistic software development infrastructure. This infrastructure allows for the straightforward, rapid creation of end-user applications for morphologically complex languages.

Another section of this webpage describes the NRC's collaboration with the Onkwawenna Kentyohkwa Mohawk-language immersion school to build an educational tool called Kawennón:nis. This tool – which is currently being extended to other Iroquoian languages – was built within the Giella infrastructure. It would have been much more difficult for the NRC team to create Kawennón:nis without the help of the ALTLab team's Giella expertise. An NRC software developer, Eddie Santos, is currently embedded in the ALTLab to enhance the synergy between the two teams.

Canadian Broadcasting Corporation (CBC)

Website: Canadian Broadcasting Corporation

Project description: CBC creates programming by and for Indigenous peoples, providing services in eight Indigenous/Inuit languages. CBC is providing the Computer Research Institute of Montreal (CRIM) with access to East James Bay Cree recordings, as part of the NRC's Indigenous languages technology project, so that CRIM can develop audio segmentation and analysis tools suitable for indexing audio recordings in Indigenous languages. CBC has shared over 1,343 hours of radio programming originally broadcast by CBC North from January 2015 to December 2016. These 1,312 audio files, which contain studio/telephone quality speech as well as music, are highly appreciated by the NRC and CRIM project teams and will be critical to the success of the project.

Carleton University

Website: Professor Marie-Odile Junker of Carleton University and her team have developed several websites for languages of the Algonquian family, in partnership with Indigenous organizations.

Project description: Algonquian Dictionaries Project (East Cree and Innu)

The collaboration with the NRC is focused on updating online language lessons developed earlier by the Carleton team, in partnership with Cree Programs and Institut Tshakapesh, aimed at supporting East Cree (2006‑2011) and Innu (2009‑2012) literacy.

The online lessons/games/exercises platform supports the creation of multimedia interactive online lessons with auto‑generated exercises/games. In this platform, users are able to listen to a word or phrase in several dialects. They then play computer‑generated interactive activities that test and enhance their vocabulary, orthography and grammar acquisition. They can also engage in more advanced grammatical and textual activities. Teachers can go online to develop additional lesson plans, and track students' progress. Language experts can access an administrative interface to develop new content.

Unfortunately, the rapid pace of change in the software industry has stranded these educational tools: many of the key functionalities no longer work as intended. The collaboration is aimed at updating the platform to align with current technology. The platform update is also an opportunity to improve the experience of second language learners (these tools were originally developed with first language speakers in mind) and to carry out user testing of the lessons.

Computer Research Institute of Montreal ( CRIM )

Website: Computer Research Institute of Montreal

Project description: News release about indexation of Indigenous language audio recordings to enable keyword search

The Computer Research Institute of Montréal ( CRIM ) is an applied research and expertise centre in information technology. Its speech and text team has a long and distinguished record of accomplishments in technologies related to speech recognition. Its audio content indexing technology indexes the spoken content of very large audio databases, making such content accessible through search engines. CRIM has applied this technology to the archives of the National Film Board ( NFB ) and to the collected testimonies of the Bastarache investigative commission. CRIM's speaker recognition technology, which identifies the person who generated a particular segment of speech, is world-class. It has consistently ranked among the top entries in international evaluations of speaker recognition systems, and is now used all over the world.

The NRC's collaboration with CRIM is focused on applying audio indexing and speaker recognition technologies to Indigenous languages. Over the years, hundreds of thousands of hours of speech have been recorded in various Indigenous languages. Unfortunately, these recordings are typically not annotated or indexed. Surprisingly, even speech data being collected now by Indigenous communities and linguists have this problem: because there is a lack of tools for segmenting speech data as they are being recorded, the stock of unannotated speech data in Indigenous languages is constantly growing.

We are tackling two aspects of this problem:

  • We are developing simple tools that will segment speech as it is being recorded. The tools will separate audio files into speech and non-speech data, and will label the speech segments by the identity of the current speaker. This should make annotation of speech currently being collected easier, for a variety of languages.
  • We also plan to build systems that will make it possible to search for particular words or phrases in audio recordings in some Indigenous languages. This will not be full speech recognition and we will not be creating systems that are able to produce high-quality transcriptions of everything that was said in a recording. Rather, the systems will enable audio keyword search, so that users will be able to search quickly through long audio recordings for particular words or topics. We are currently targeting Inuktut and Cree. The Pirurvik Centre is providing valuable assistance on the Inuktut part of this project.
First Peoples' Cultural Council

Website: First Peoples' Cultural Council

Project description: News release about Upgrades to FPCC's FirstVoices Language Tutor software

Official Languages, Department of Culture and Heritage, Government of Nunavut

Website: Official Languages, Department of Culture and Heritage, Government of Nunavut

Project description: Coming soon

Onkwawenna Kentyohkwa Language School

Website: Onkwawenna Kentyohkwa Language School

Project description: Kawennón:nis verb conjugator

Onkwawenna Kentyohkwa is an immersion school for teaching Kanyen'kéha (the "Mohawk" language) to adult learners. It is located on the Six Nations of Grand River reserve in southwestern Ontario. Onkwawenna Kentyohkwa was established in 1999 by Owennatekha (Brian Maracle) and Onekiyohstha (Audrey Maracle). Owennatekha is the lead instructor at the school. Many of the school's 100 graduates have gone on to teach the Kanyen'kéha language at the pre-school, elementary, secondary, university or community level.

The focus of the NRC's collaboration with Onkwawenna Kentyohkwa is Kawennón:nis, meaning 'wordmaker' in Kanyen'kéha. Kawennón:nis is a verb conjugator meant to assist learners and educators at the school students of the language, wherever they might be. The idea for the tool was suggested by Owennatekha. The creation and extension of this tool involves a number of researchers at the NRC, Owennatekha, and two other educators from Onkwawenna Kentyohkwa. The language model that powers Kawennón:nis is the first of its kind for any Iroquoian language. Kawennón:nis's user interface is closely linked to the school's curriculum, and is being designed collaboratively between students and educators there, and NRC researchers. Kawennón:nis will be hosted by the school online and on Android and iOS devices; language-independent technology developed for it will be released with an open-source licence.

Pirurvik Centre

Website: Pirurvik Centre

Project description: Pirurvik is a centre of excellence for Inuit language, culture and well-being. It was founded in the fall of 2003, and based in Nunavut's capital, Iqaluit. The main focus of the NRC's collaboration with Pirurvik is the transcription into written form of audio recordings of spoken Inuktut. The project criteria will be to select materials that are original language with a depth of vocabulary and not 'thinking in English' while speaking Inuktut.

The transcribed Inuktut speech data will be subsequently be used by the NRC and one of its other partners, Computer Research Institute of Montreal, to develop speech recognition tools that will make it possible to search other Inuktut speech recordings using text queries. This will make it easier for people who speak Inuktut to access and navigate audiovisual documents in their language.

This list is updated on a regular basis and as the project proceeds, collaborations with other organizations will be developed and this list updated.

Publications

The following is a list of selected publications by the project team and their collaborators relating to research in Indigenous languages technology.

Our project team

Anna Kazantseva

Anna Kazantseva, PhD

Computational linguistics of literature (novels and stories); modeling discourse structure of long informal documents; computational linguistics of Iroquoian languages.

Roland Kuhn

Roland Kuhn, PhD (project lead)

Automatic speech recognition; machine translation.

Patrick Littell

Patrick Littell, PhD (project advisor)

Computational linguistics of low-resource languages; he has worked with several Indigenous languages, including Kwak'wala/Bak'wamk'ala, Gitksan, and Nłeʔkepmxcín (Thompson River Salish).

 Aidan Pine

Aidan Pine

Development of software for supporting Indigenous languages; he has developed tools in collaboration with Gitksan & Heiltsuk communities.

Eddie Antonio Santos

Eddie Antonio Santos

Software engineering; Applied language modeling; Unicode wrangler.

Alain Desilets

Alain Désilets

Natural language processing applications developer. Leads the WeBInuk project, which allows translators to search large amounts of English-Inuktut parallel content.

Joanis Eric

Eric Joanis

Computational linguistics; statistical natural language processing; machine translation; software optimization and robustness.

 

Advisory committee

We are committed to developing technology in collaboration with Indigenous stakeholders, and have implemented an Indigenous Advisory committee to advise on collaborative methodologies and evaluate project implementations.

Heather Souter

Heather Souter

Chair of the NRC's Indigenous Languages Technology Project Advisory Committee
Secretary-Treasurer, Prairies to Woodlands Indigenous Language Revitalization Circle

Heather is currently directing a new Master-Apprentice Program in Manitoba and is the Secretary-Treasurer of the Prairies to Woodlands Indigenous Language Revitalization Circle. She holds a Bachelor of Arts from the University of British Columbia and a Masters of Education in Indigenous Language Revitalization from the University of Victoria. Heather is reclaiming her heritage language and, in collaboration with Elders, has published educational resources for the Michif language, such as a conversational phrase book and a college level beginner's course. Heather's interests include the use of the Internet to reach language learners in the diaspora and to create technology-mediated speech communities. She is a citizen of the Métis Nation and a member of the Manitoba Métis Federation.

Tessa Erickson

Tessa Erickson

Youth Ambassador, Nak'azdli Whut'en First Nation

Tessa is a member of the Nak'azdli Whut'en First Nation and an eleventh grade student at DP Todd Secondary School in Prince George, BC. Tessa is also a graduate of the First Nations' Technology Council's "Bridging to Technology" program and runs a language project called Dak'elh K'una which is organizing the creation of a Dak'elh language app and immersion summer camp.

Amanda Evic-Kuluguqtuq

Amanda Evic-Kuluguqtuq

Senior Instructor, Pirurvik Centre

Amanda is a graduate of the Nunavut Teacher Education Program (NTEP) who started her career teaching in Apex, and later at Iqaluit’s Joamie School. As the Executive Director for Tumikuluit Saipaaqivik, she led Iqaluit’s only Inuktut immersion daycare. Amanda grew up in a rich Inuktut speaking and cultural environment in Panniqtuuq with her maternal grandparents. As well as teaching courses in the Pirurvik Centre’s Inuktut Revitalization program for Inuit and Inuktut Second Langauge, Amanda assists with the design, writing and teaching of new programs and learning resources.

Blaire Gould

Blaire Gould

Director of Programs and Student Support, Mi'kmaw Kina'matnewey

Blaire is the Director of Programs and Student Support at Mi'kmaw Kina'matnewey. She comes from the Mi'kmaq district of Unama'ki and is a proud L'nu'skw and speaker. She strives to advance the educational opportunities and rights for the Mi'kmaq people. Blaire has continued to pursue new and innovating ways to infuse language and culture into the 21st century. She is part of an inspiring team of Mi'kmaq scholars and educators whose collective and individual contributions to Mi'kmaw education have created space for Mi'kmaq innovation in the education system.

Glenn Karonhiio Morrison

Glenn Karonhiio Morrison

Senior Advisor, Natural Resources Canada

Glenn is a Senior Advisor at Natural Resources Canada (formerly Indigenous Policy Manager, Aboriginal Peoples Program, Canadian Heritage). As former Executive Director of the First Nations Confederacy of Cultural Education Centres in the 1990s, he managed the first online presence of a First Nations organization in Canada in 1992 using a bulletin board program and 3rd-party software. More recently, Glenn chaired the interdepartmental Indigenous Languages Translation/Technology Working Group involving Canadian Heritage, Library and Archives Canada, National Research Council of Canada, Parliamentary Translation Bureau and others. He has a longstanding interest in the revitalization of Indigenous languages. Currently, he is on one of NRCan's consultation teams working with BC First Nations on the Trans Mountain pipeline expansion project. He successfully completed the first level of Onkwawenna Kentyohkwa's online Kanien'kehá:ka language program and is a member of the Mohawks of Kahnawá:ke.

Gerry Lawson

Gerry Lawson

Oral History and Language Lab Manager, Museum of Anthropology, University of British Columbia

Gerry is a proud member of the Heiltsuk First Nation and manages the Oral History and Language Lab at the UBC Museum of Anthropology. With over 15 years in the field of Information Management and Heritage Digitization, he works to develop practical, scalable resources for Indigenous cultural heritage preservation, and to decolonize information practices. Gerry also acts as the Technology Lead for the innovative UBC Indigitization Program and sits on the Board of Directors for the First Peoples' Cultural Council.

Delaney Lothian

Delaney Lothian

Youth Ambassador, University of Alberta

Delaney is a fourth year undergraduate student at the University of Alberta majoring in Computer Science and Math. Since her early teens she has worked with her home community of Lac Ste. Anne documenting culture and history. She is working on an interdisciplinary project to develop a language learning system for the Y-dialect of Cree under the supervision of University of Alberta Computing Science professor Dr. Carrie Demmans Epp and University of Alberta Cree professor Dorothy Thunder.

Megan Lukaniec

Megan Lukaniec

Megan Lukaniec is Wendat from the Huron-Wendat Nation of Wendake, Québec and an Assistant Professor of Indigenous Language Revitalization in the Department of Linguistics at the University of Victoria. Since 2006, she has been working with and for her community in order to reawaken and reclaim the Wendat (Iroquoian) language, which was dormant for well over a century. Within the scope of a SSHRC CURA grant (2007-2012) awarded to the Huron-Wendat Nation and Université Laval, her role as a linguist included reconstructing the language from legacy documentation, training language teachers, teaching introductory language courses, and creating pedagogical materials. In 2017, with the collaboration of the CDFM Huron-Wendat, she created the initial designs of and reconstructed content for an online trilingual dictionary (Wendat-French-English; wendatlanguage.com). She obtained her Ph.D. in Linguistics from the University of California, Santa Barbara, and for her dissertation work, she reconstructed and described the verb morphology of Wendat.

Onowa McIvor

Onowa McIvor

Associate Professor, University of Victoria

tânisi kiyawaw (greetings to you all). Onowa is maskékow-ininiw (a Swampy Cree person) and Scottish-Canadian, born and raised in Treaty 6 territory. She has been a grateful visitor in SENĆOŦEN and Lekwungen speaking territories for over twenty years and is an urban nêhiyâwiwin language learner and Indigenous language warrior. Onowa is an Associate Professor of Indigenous Education at University of Victoria, where she was the former Director of Indigenous Education in the Faculty of Education. Onowa is co-lead on a Social Science and Humanities Research Council of Canada (SSHRCC) Partnership Grant entitled NEȾOLṈEW̱, which is working to build capacity among Indigenous people and maximize Indigenous language revitalization resources in Canada.

Marilyn Shirt

Language Team Lead, University nuhelot'įne thaiyots'į nistameyimâkanak Blue Quills (UnBQ)

Marilyn is a member of the Saddle Lake Cree Nation and has worked in adult education for twenty-seven years, four years in small business and four years in Cree Immersion Head Start programming before devoting her time to Language revitalization for both Cree and Dene at UnBQ. While at UnBQ Marilyn has spearheaded the development of a Bachelor of Arts in Cree and Dene, a Masters in Indigenous Languages, an Elders Senate as well as Language Resource Department which produces audio, video and written resources in both Cree and Dene.

 Skayda.û, Tina Jules

Tina Jules Skayda.û

Director of the Yukon Native Language Centre

Tina is the Director of the Yukon Native Language Centre for the Council of Yukon First Nations. She is of Tlingit, Mountain Slavey and Cree ancestry and is a citizen of the Teslin Tlingit Council. Her Tlingit name is Skayda.û and she belongs to the Dakhlaweidí (Eagle) clan. Tina holds a Bachelor of Education from the University of Regina and is a proud graduate of the Yukon Native Teacher Education Program. Her Master's Degree in Education for curriculum and instruction is from Simon Fraser University. She is a passionate advocate for Indigenous language revitalization and indigenized education.

Nathan Thanyehténhas Brinklow

Nathan Thanyehténhas Brinklow

Lecturer, Queens University

Nathan is an educator of Kanyen'kéha (Mohawk) with years of experience teaching both at the Tsi Tyónnheht Onkwawén:na Language and Cultural Centre (TTO) and at Queens University. Nathan has a strong interest in how computational methods can be applied to language revitalization and pedagogy and has been involved in the development of Indigenous Language and Mohawk Language and Culture certificates in partnership with TTO and Queens University.