Documents
Presentation Slides
Presentation Slides
Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation
- Citation Author(s):
- Submitted by:
- Kanji Takahashi
- Last updated:
- 21 November 2016 - 8:28pm
- Document Type:
- Presentation Slides
- Document Year:
- 2016
- Event:
- Presenters:
- Kanji Takahashi
- Paper Code:
- 15
- Categories:
- Keywords:
- Log in to post comments
We have investigated the effect of normalizing Japanese orthographical variants into a uniform orthography on statistical machine translation (SMT) between Japanese and English. In Japanese, 10% of words have reportedly more than one orthographical variants, which is a promising fact for improving translation quality when we normalize these orthographical variants. However, the results show that SMT with normalization is equivalent to that without normalization by both BLEU and RIBES measurement, even though normalization reduces the size of language models, its perplexity, and the number of out-of-vocabulary words. We discuss the potential reasons in this paper.