Documents
Presentation Slides
The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation
- Citation Author(s):
- Submitted by:
- Kazuhide Yamamoto
- Last updated:
- 21 November 2016 - 8:31am
- Document Type:
- Presentation Slides
- Document Year:
- 2016
- Event:
- Presenters:
- Kazuhide Yamamoto
- Categories:
- Keywords:
- Log in to post comments
This paper presents initial English-Tigrinya statistical machine translation (SMT) research. Tigrinya is a highly inflected Semitic language spoken in Eritrea and Ethiopia. Translation involving morphologically complex languages is challenged by factors including data sparseness and source-target word alignment. We try to address these problems through morphological segmentation of Tigrinya words. After segmentation the difference in token count dropped significantly from 37.7% to 0.1%. The out-of-vocabulary rate was reduced by 46%. We analyzed phrase-based translation with unsegmented corpus and segmented corpus to study the effect of segmentation on translation quality. Preliminary results demonstrate promising performance improvement from a relatively small parallel corpus.