Sorry, you need to enable JavaScript to visit this website.

The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation

Citation Author(s):
Yemane Tedla and Kazuhide Yamamoto
Submitted by:
Kazuhide Yamamoto
Last updated:
21 November 2016 - 8:31am
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Kazuhide Yamamoto
 

This paper presents initial English-Tigrinya statistical machine translation (SMT) research. Tigrinya is a highly inflected Semitic language spoken in Eritrea and Ethiopia. Translation involving morphologically complex languages is challenged by factors including data sparseness and source-target word alignment. We try to address these problems through morphological segmentation of Tigrinya words. After segmentation the difference in token count dropped significantly from 37.7% to 0.1%. The out-of-vocabulary rate was reduced by 46%. We analyzed phrase-based translation with unsegmented corpus and segmented corpus to study the effect of segmentation on translation quality. Preliminary results demonstrate promising performance improvement from a relatively small parallel corpus.

up
0 users have voted: