Project to create Inuktut language software and perform new text alignment of the Nunavut Legislative Assembly proceedings

 

While Inuktut is an official language in the province of Nunavut, there are far fewer technologies, tools and resources available for Inuktut learners and language professionals than for the province's other two official languages, English and French.

The NRC is collaborating with the Pirurvik Centre and the Government of Nunavut to develop new technologies for Inuktut language learners and professionals, and to reinforce Inuktut's status as an official language in the province.

Collaborators

Pirurvik Centre

The Pirurvik Centre is a centre of excellence for Inuit language, culture, and well-being. It was founded in the fall of 2003, and is based in Nunavut's capital, Iqaluit. The NRC's collaboration with the Pirurvik Centre began in summer 2018 and now covers audio indexation, creation of software for language learners and professionals, and sentence alignment. Access to the Pirurvik Centre's expertise on the Inuktut family of languages is a tremendous asset for the NRC team.

Government of Nunavut

The Nunavut Legislative Assembly has kindly provided the NRC with an updated version of the Nunavut Hansard, covering proceedings between 1999 and 2017. The new corpus will be used to perform Inuktut-to-English sentence alignment.

Objectives

  • Develop a new suite of tools for people who work with or are learning Inuktut: update to WeBInuk and other tools
  • Perform automatic sentence alignment of a new Nunavut Hansard corpus (1999-2017)

Deliverables

  • Web search engine, aid to translators, spell checker, and other tools deployed for Inuktut language learners and professionals
  • New open source parallel corpus of Inuktut-English aligned sentences available to computational linguists and other language professionals

Activities

Software tools for Inuktut as an official language

In October 2018, the NRC and the Pirurvik Centre began to collaborate on building software tools to assist people who work with Inuktut. Though it is an official language of Nunavut, Inuktut still lacks tools that are taken for granted in English and French. This project aims to fill this gap by implementing and deploying a web search engine, an aid to translators, a spell checker, and other tools for learners of the language, linguists, and people who work with Inuktut on a regular basis, such as employees of the Nunavut government. The project builds on ground-breaking work carried out previously at the NRC on morphological analysis, and on creation of a tool for translators that was called WeBInuk. The first version of the new tools will be deployed in 2020, and will be freely accessible on the Web.

Automatic sentence alignment

In the past, research by computational linguists on Inuktut benefited greatly from a version of Nunavut Legislative Assembly proceedings – the Nunavut Hansard – with Inuktut and English sentences aligned with each other. This parallel corpus (body of text) was created and open-sourced by the NRC in 2005. The NRC project team is currently working on automatic sentence alignment of the Nunavut Hansard proceedings between 1999 and 2017. When completed, the new sentence-aligned corpus will be much larger than the version of the Nunavut Hansard released by the NRC in 2005.

To quality check the automatic alignment, experts employed by the Pirurvik Centre will manually align around 8,500 sentence pairs from the 1999-2017 Nunavut Hansard. This "gold standard" alignment will enable the NRC team to improve its automatic alignment algorithm. When it is open-sourced, we expect the new sentence-aligned Nunavut corpus, along with the manually aligned "gold standard" subset, to encourage new work on Inuktut by the international research community.

Project team

Alain Désilets

Alain Désilets

Natural language processing applications developer. Leads the WeBInuk project, which allows translators to search large amounts of English-Inuktut parallel content.

Eric Joanis

Eric Joanis

Computational linguistics; statistical natural language processing; machine translation; software optimization and robustness.

Gavin Nesbitt

Gavin Nesbitt

Director, Pirurvik Centre

 

Contact us

Janet Tamalik McGrath
Inuktut Language Consultant, Pirurvik Centre

Email: info@pirurvik.ca

The Legislative Assembly of Nunavut

Email: leginfo@assembly.nu.ca

Roland Kuhn
Project Leader, Indigenous Languages Technology Project, NRC

Telephone: 613-993-0821
Email: Roland.Kuhn@nrc-cnrc.gc.ca
LinkedIn: Roland Kuhn

Related links