Roles and responsibilities
I am a Technical Advisor, Application Development (CS3) with the NRC Digital Technologies Research Centre, in the Multilingual Text Processing (MTP) team. I am a hardcore computer geek, with extensive experience of tackling a variety of tough software and IT systems challenges. I especially like working with High Performance Computing (HPC) systems, such as advanced Linux compute clusters.
However, unlike some computer geeks, I enjoy the company of human beings - I’ve been praised by my colleagues for my communication skills and my commitment to teamwork. My role is to support these colleagues: I’ve become the go-to guy when they encounter difficult IT problems or apparently inexplicable bugs. I build new systems for the team, often ones based on machine learning that are trained on large amounts of data, and deliver these systems to clients. I test, simulate, debug, and troubleshoot new code, and fix issues from several different projects. In particular, I’m the prime tester / guinea pig for team projects that involve HPC.
Current research and/or projects
At the time of writing, I’m most active in two big projects in the MTP team, the Indigenous Languages Technology (ILT) project and the Portage machine translation project.
For a general overview of the ILT project, see Canadian Indigenous languages technology project - National Research Council Canada. My involvement in ILT is mostly with two subprojects:
- The development of the ReadAlong Studio software: ReadAlong Studio: Application for Indigenous audiobooks and videos project - National Research Council. Canada. ReadAlong Studio is a web-based plug-in for Indigenous audiobooks developed by the NRC, Carleton University and Indigenous collaborators. Words on the screen are highlighted as they are read or sung out loud. The reader can click on any word to hear it spoken. I’ve enjoyed working on the technical challenges associated with aligning speech with text in a variety of Indigenous languages.
- Speech generation for Indigenous language education - National Research Council Canada. This exciting subproject is exploring the possibility of producing high-quality synthesized speech for three different Indigenous languages which are unrelated to each other and in different parts of the country. If it succeeds, it may pave the path to the creation of artificial speech in many different languages spoken across Canada. We are working with teachers in the three communities to determine how this capability can be used to further language teaching. My role here has been testing the software and working closely with its main creator, Aidan Pine, to speed up training of the speech generation systems and to eliminate bugs.
The Portage machine translation project: from about 2004 to 2017, this project focused on building one of the best machine translation systems in the world based on the then-dominant statistical paradigm for machine translation, That phase of the project yielded several valuable tools, some of which are still in use: GitHub - nrc-cnrc/PortageTextProcessing: Text processing tools that came out of the Portage SMT project — Outils de traitement de texte issus du projet Portage de TAS. However, around the time I joined NRC (2016) the entire machine translation (MT) field was undergoing a paradigm, to methods based on neural / deep learning systems. The Portage project now focuses on these methods as well. My role in the project has thus involved building and testing neural MT systems:
- I have been helping one of our most important clients, the House of Commons, integrate a new neural MT system that I trained into their pipeline; it will show translation suggestions to their translators. It replaces an older statistical MT system.
- The Jessica subproject: the federal Translation Bureau has given us access to an enormous corpus of their previous English-French translations. These translations are not all of good quality. I am exploring various filtering techniques on this enormous corpus, to find the optimal subset for training a neural MT system for translating between the two languages (in both directions).
Awards
2021 - NRC Intellectual Property Achievement Award (IPAA) for impactful open-source software contribution.
2021 - NRC / Digital Technologies Instant Award
2016 - Finalists of the GTEC Awards 2016
2015 - Public Service Award of Excellence
2015 - Finalists of the GTEC Awards 2015
2012 - Finalists of the GTEC Awards 2012
2012 - Innovation Award of the Translation Bureau
2010 - Finalists of the GTEC Awards 2010
2009 - PWGSC Award of Excellence
2006 - Recognition Award for International Translation Day
2004 - Quality of Service Award of the Translation Bureau