Sorry, you need to enable JavaScript to visit this website.

Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation

Citation Author(s):
Kazuhide Yamamoto, Kanji Takahashi
Submitted by:
Kanji Takahashi
Last updated:
21 November 2016 - 8:28pm
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Kanji Takahashi
Paper Code:
15
 

We have investigated the effect of normalizing Japanese orthographical variants into a uniform orthography on statistical machine translation (SMT) between Japanese and English. In Japanese, 10% of words have reportedly more than one orthographical variants, which is a promising fact for improving translation quality when we normalize these orthographical variants. However, the results show that SMT with normalization is equivalent to that without normalization by both BLEU and RIBES measurement, even though normalization reduces the size of language models, its perplexity, and the number of out-of-vocabulary words. We discuss the potential reasons in this paper.

up
0 users have voted: