Overview of the tool
YiSiFootnote 1 is open-source software that evaluates the accuracy of meaning in output sentences produced by machine translation systems. It uses datasets that contain word embeddings to estimate the relationships of meaning between words, in order to assign an accuracy score from 0-100 for each translated sentence. The software was developed by the National Research Council of Canada's Digital Technologies Research Centre.
- Developers of machine translation systems
- Computational linguists
Benefits to users
- YiSi can pinpoint problems in machine translation output; it helps developers identify areas that require improvement
- There is a high correlation with human scoring of the accuracy of meaning in translated sentences; it helps developers evaluate and compare machine translation systems
- YiSi was developed to run on Linux.
- YiSi is written in C++ and requires a version of g++ that supports C++11; we're using GCC 4.9.3.
- YiSi requires make; we're using GNU make version 3.81.
- YiSi requires bash; we're using GNU bash version 4.1.2.
Technical tool description
YiSi is a family of semantic machine translation (MT) evaluation metrics with a flexible architecture for evaluating machine translation output in languages with differing amounts of training resources. Inspired by the MEANT 2.0 software (Lo, 2017), YiSi-1 measures the similarity between the human references and MT by aggregating the weighted distributional lexical semantic similarity, and, optionally, the shallow semantic structures. YiSi-0 is a degenerate resource-free version that uses the longest common character substring accuracy to replace distributional semantics for evaluating lexical similarity between the human reference and MT output. On the other hand, YiSi-2 is the bilingual reference-less version that uses bilingual word embeddings for evaluating cross-lingual lexical semantic similarity between the input and machine translation output.
YiSi-1 achieved the highest average correlation with human direct assessment (DA) judgment across all language pairs at system-level and the highest median correlation with DA relative ranking across all language pairs at segment-level in the 2018 Third Conference on Machine Translation (WMT2018) metrics task (Ma et al., 2018). YiSi-1 also successfully served in the WMT2018 parallel corpus filtering task while YiSi-2 showed comparable accuracy in the same task.
YiSi-0 is readily available for evaluating all languages. YiSi-1 requires a monolingual corpus in the output language to train the distributional lexical semantics model. YiSi-1_srl is designed for resource-rich languages that are equipped with an automatic semantic role labeler in the output language. YiSi-2 requires bilingual word embeddings and YiSi-2_srl additionally requires an automatic semantic role labeler for both the input and output language.
YiSi is available free of charge for research and commercial purposes. Contact us to find out more.
- Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte and Patrick Littell. Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task. Third conference on Machine Translation (WMT 2018). Brussels, Belgium: Nov 2018.
- Chi-kiu Lo. MEANT 2.0: Accurate semantic MT evaluation for any output language. Second conference on Machine Translation (WMT 2017). Copenhagen, Demark: Sept 2017.
- Ma, Qingsong and Bojar, Ondrej and Graham, Yvette. Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance. Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task. Belgium, Brussels: October 2018.
Download YiSi and its word embeddings
Master code used to run sentence evaluation:
Pretrained word embeddings:
Pretrained word embeddings – accessible in the NRC Digital Repository
- Chinese, tokenized by Stanford Chinese segmenter