Multilingual text processing

The National Research Council of Canada's (NRC) multilingual text processing team carries out research and development in multilingual natural language processing (NLP). This includes machine translation and other language technologies for multilingual contexts.

In particular, we collaborate with government, industry, academia, and other partners on language technologies to support Canada's official languages and the revitalization of Indigenous languages. We also conduct foundational research and excel in international competitions where the calibre of our research and technology is benchmarked against other leaders in the field.

What we offer

Housed within the NRC's Digital Technologies Research Centre, our team's core competencies include:

  • computer-assisted translation
  • machine learning for natural language applications
  • machine translation
  • multilingual text mining
  • social media analysis and modelling
  • translation quality evaluation

We apply our expertise to:

  • translation and language service providers, in support of the Government of Canada's Policy on Official Languages:
    • computer-assisted translation with the Translation Bureau, Courts Administration Services, and private sector language service providers
    • machine translation quality evaluation and estimation with the Translation Bureau
    • parallel corpus filtering and cleaning with the Translation Bureau and the Université de Montréal
    • translation routing with the Translation Bureau
    • translation equivalence error detection with the Public Service Commission of Canada
  • learning technologies:
    • automatic language proficiency assessment and modelling
    • Indigenous Languages Technology Project: software and tools to support Indigenous language schools, educators, students, communities, and technology developers, with multiple partners
    • Language Comprehension Tool, a second language reading assistant for Canadian government employees, with the Translation Bureau
    • machine translation for second-language writing with Dublin City University and the Université du Québec en Outaouais
  • intelligence, monitoring, and security:
    • detection of changes within an unfolding event in real time from news articles or social media
    • machine translation of social media contents for business and security intelligence

Software and applications

Why work with us

Our team is a unique mix of world-class researchers with backgrounds in computational linguistics, engineering and machine learning, combined with strong, savvy software developers. Our collaborators appreciate our deep technical knowledge, our ability to deliver software components that are easy to integrate, and the state-of-the art results and models we can deliver from their data.

We can take translation and other language technologies from research concepts all the way to products suitable for distributors and end users. Past examples of language technologies we have developed and delivered include word alignment for terminology extraction, statistical machine translation for language comprehension, and cross-lingual semantic similarity for detecting translation errors.

International competitions and shared tasks

Our team is a regular participant and top performer in several tasks at the annual Conference on Machine Translation (formerly called Workshop on Machine Translation or WMT). We are also a leading participant in the International Workshops on Semantic Evaluation (SemEval), the Discriminating Similar Languages series, and the Native Language Identification evaluations.

Team results: WMT 2019

Team results: WMT 2018

Team results: SemEval

Team results: Discriminating Similar Languages series

Team results: Native Language Identification evaluations

Team members

Aidan Pine
Anna Kazantseva
Chi-kiu (Jackie) Lo
Cyril Goutte
Darlene Stewart
Eddie Santos
Éric Joanis
Gabriel Bernier-Colborne
Marc Tessier
Michel Simard
Patrick Littell
Rebecca Knowles
Roland Kuhn
Samuel Larkin
Serge Léger
Sowmya Vajjala
Yunli Wang

Image gallery

Contact us

Interested in applying our multilingual text processing expertise to your project? Contact our experts today!

Cyril Goutte
Team Leader, Multilingual Text Processing
Email: Cyril.Goutte@nrc-cnrc.gc.ca

Targeted industries

Information and communications technology; Analytics; Learning systems.

Locations

  • Moncton
  • Montréal Decelles
  • Ottawa Montreal Road
  • Edmonton
  • Victoria