Michif digital talking dictionary

This project mobilized and made accessible an out-of-print Michif dictionary called The Michif Dictionary: Turtle Mountain Dictionary Chippewa Cree, first published in 1983.

With the assistance of Michif first-language speakers, project partners and computational linguists, the team developed a digital, spoken version of this important resource, available both online and as a mobile application.

The project also contributed to developing local capacity in technologies for Indigenous language documentation and revitalization. The project was funded in part by the NRC's Indigenous Languages Technology Project.

Collaborators

Objectives

  • Develop a digital, spoken version of The Michif Dictionary: Turtle Mountain Dictionary Chippewa Cree
  • Build local capacity in technologies for Indigenous language documentation and revitalization

Deliverables

  • Mother Tongues Michif dictionary: digital, talking dictionary available online and as a mobile application
  • Mother Tongues Michif dictionary mobile application: available soon on Apple Store and Google Play

Activities

Capacity development

This project helped develop capacity through the training of emerging Métis community linguists, language workers and scholars in the areas of audio recording, application of speech technologies and annotation.

Recording

This project has produced over 181 hours of high-quality audio recordings of the dictionary from 4 speakers. One speaker, Verna DeMontigny, recorded the entire dictionary from cover to cover, while others recorded selected portions of it. Thus, all 350 pages of Michif lexical entries and example sentences have been recorded by at least 1 speaker, with some entries being recorded by 2 or more speakers. Multiple Michif varieties are represented in these recordings. It was particularly important for the Belcourt, North Dakota variety to be represented, as the original creators of the dictionary spoke this variety.

Annotation

Audio recordings were annotated using ELAN, an open-source software tool commonly used in language documentation and revitalization to produce time-aligned transcripts. Each recording was automatically segmented into pause-delimited utterances using a deep neural network (DNN) voice activity detection service that was developed within the VESTA-ELAN project by the Computer Research Institute of Montréal (CRIM). This auto-segmentation saved an immeasurable amount of time in the annotation process.

Both Michif text and English translations from the optically recognized text of the dictionary were then integrated into these transcripts by a team of Indigenous and non-Indigenous language workers, and a student from a course in applied linguistics.

Optical character recognition (OCR) corrections and review

Manual review of and corrections to the OCR text for all 349 pages of the dictionary was performed by 14 Carleton University undergraduate students as part of a community service learning project in the course ALDS 3903 Indigenous languages in Canada, Winter 2021. Students used Transkribus Lite to identify and address errors in the computer-readable text of the dictionary that were introduced by previously applied OCR methods. They corrected misspelled words and entered words or lines that were present on the page but missed by the OCR software.

Errors were found and corrected on a total of 1,600 lines of text, or 8.5% of the dictionary. These corrections make significant improvements to the overall quality of the dictionary. The revised text was integrated into the web- and app-based dictionary platforms.

Dissemination

The contents of the dictionary have been converted from legacy formats and loaded into the Mother Tongues web platform using a simple workflow so that the dictionary can be added to, corrected, and expanded upon over time. The online dictionary, which includes 15,422 entries, is now publicly available. The app is ready for launch and will soon be available through the Apple Store and Google Play.

Project team

Prairies to Woodland Revitalization Circle
  • Heather Souter, Project Co-Lead
  • Olivia Sammons, Project Co-Lead
  • Verna DeMontigny, Project Advisor and Self-Documentation Technician
  • Kai Pyle, Senior Documentation Technician
  • Wanda Smith, Documentation Technician
  • Karen Langan, Documentation Technician
  • Connie Henry, Documentation Technician and Annotator
  • Laura Grant, Project Management Support
Technical support
  • Chris Cox, Computational Linguist and Advisor
  • Jacob Collard, Computational Linguist and Technical Lead
  • Samantha Cornelius, Linguist and Annotation Lead
  • Fineen Davis, Computational Linguist
  • Anna Belew, Linguist and Advisor
  • Students of ALDS 3903-C, Carleton University, Winter 2021
  • Delaney Lothian, Computer Scientist
Turtle Mountain Community College – owners of the original dictionary
  • Dr. Kelly Hall
  • Dr. Terri Martin-Parisien
  • Dr. Teresa Delorme
  • Ms. Laisee Allery
Speakers
  • Verna DeMontigny, The Corner, MB
  • Sandra R. Houle, Belcourt, ND
  • Albert Parisien, Belcourt, ND
  • Connie Henry, Boggy Creek, MB
Annotators
  • Breanne Beaubien
  • Caitlin Bergen
  • Maddison Brooks
  • Awanigizhik (Roderick) Bruce
  • Jessica Charest
  • Amanda Desormeaux
  • Terri Dixon
  • Jeanelle Dunkley
  • Mackenzie Elliot
  • Alexandra Ethier
  • Briana Faubert
  • Kaitlyn Foley
  • Cassandra Gaudard
  • Ashlyn Hickey
  • Chantelle Jackson
  • Mira Kolodka
  • Kim Laberinto
  • Jessica Lagimodiere
  • Madissan Le Bouthillier
  • Calista Mawakeesic
  • Sophie Melanson-Hayes
  • Marta Meljnik
  • Nabilah Muhammad-Yusuf
  • Daniel Ondercin
  • Jane Pepabano
  • Latasia Phan-Dos Reis
  • Nicole Reel
  • Alaa Sarji
  • Talula Schegel
  • India Schegel
  • Samantha Schwab
  • Dominique Simard
  • Carly Sommerlot
  • Gail Welburn
  • Janelle Zazalak
Volunteers
  • Deanna Garand
  • Iwona Gniadek
  • Abby Graham
  • Rebecca Kirkpatrick
  • James Lavallee
  • Melanie Lavallee
  • Itziri Moreno
  • Bamidele Olowo-okere
  • Vasiliki Vita

Image gallery

Contact us

Heather Souter, MEd, Projects Director, Secretary-Treasurer and Co-Founder
Prairies to Woodlands Indigenous Language Revitalization Circle
Telephone: 204-647-0081
Email: p2wilrc@gmail.com

Roland Kuhn, Project Leader
Indigenous Languages Technology Project
Email: Roland.Kuhn@nrc-cnrc.gc.ca

Related links