Roles and responsibilities

I am a research officer in the Multilingual Text Processing team at the Digital Technologies Research Centre (DT), National Research Council of Canada (NRC-CNRC). I primarily conduct research in Natural Language Processing (NLP) and I also work on application focused projects occasionally through NRC's IRAP program and Digital Analytics Center. 

Current research and/or projects

- Information Extraction (e.g, Named Entity Recognition) and Text classification

- Text generation (e.g., Keyphrase generation)

- Reproducibility of research

- Interface of NLP research with other disciplines (Education, Language Assessment, Economics, etc)

Research and/or project statements

I am broadly interested in NLP research and its relevance to other disciplines and industry practice. Specifically, my current research interests are in information extraction and text classification, and in evaluating NLP performance beyond comparing various models. Apart from research, I am also interested in communicating about NLP to various audiences, including students and researchers from other disciplines who are starting out with NLP. 


PhD in Computational Linguistics, Eberhard Karls University of Tuebingen, Germany, 2015 (Summa Cum Laude).

M.S. in Computer Science and Engineering, International Institute of Information Technology, Hyderabad, India, 2009.

B.Eng. in Electronics and Communications Engineering, Osmania University, Hyderabad, India, 2005.

Professional activities/interests

- "Generative AI and Applied Linguistics". Invited talk in the American Association of Applied Linguistics webinar series, May 2024. 

-  "Beyond the state of the art models: What is complex text, and what are we simplifying?", Invited talk at EMNLP-2022 workshop on Text Accessibility, Readability and Simplification. December 2022

- "NLP without an annotated dataset", 90 minute tutorial delivered at Toronto Machine Learning Summit (2021) and Open Data Science Conference (2021)

(Slides, Code and Other Materials

- "NLP beyond NLPers: The many faces of NLP in academia and the real-world", Plenary talk at the 46th Conference of the Japan Association of English Corpus Studies (JAECS), 2020 (Slides)


Association for Computational Linguistics (ACL)

Association for Computing Machinery (ACM)

Key publications

"Practical Natural Language Processing: a comprehensive guide to building real-world nlp systems". a book I co-authored with Bodhisattwa Majumder, Anuj Gupta and Harshit Surana, published by O'Reilly Media in 2020. [Translated into Chinese, Simplified Chinese, Japanese and Polish so far]. 

Full list of publications can be found here: ‪Sowmya Vajjala‬ - ‪Google Scholar‬

Previous work experience

- 2018-19: Senior Data Scientist at "The Globe and Mail" (Toronto), and "AbacusNext" (Toronto)

    * Building data science teams: hiring and mentoring

    * Working on building research prototypes for various NLP usecases

    * Talking to various stake holders to gather requirements

    * Working with devops teams to deploy applications

International experience and/or work

- January 2016-April 2018: Assistant Professor (Tenure Track), Iowa State University, USA

   * Teaching courses in data science, programming, natural language processing, and technical communciation

   * Mentoring students

   * Conducting research and writing grant applications

