Sorry, you need to enable JavaScript to visit this website.

Knowledge and Data Engineering

The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation


This paper presents initial English-Tigrinya statistical machine translation (SMT) research. Tigrinya is a highly inflected Semitic language spoken in Eritrea and Ethiopia. Translation involving morphologically complex languages is challenged by factors including data sparseness and source-target word alignment. We try to address these problems through morphological segmentation of Tigrinya words. After segmentation the difference in token count dropped significantly from 37.7% to 0.1%. The out-of-vocabulary rate was reduced by 46%.

IALP-tig.pdf

PDF icon IALP-tig.pdf (134 downloads)

Paper Details

Authors:
Yemane Tedla and Kazuhide Yamamoto
Submitted On:
21 November 2016 - 8:31am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

IALP-tig.pdf

(134 downloads)

Keywords

Additional Categories

Subscribe

[1] Yemane Tedla and Kazuhide Yamamoto, "The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1281. Accessed: Dec. 16, 2017.
@article{1281-16,
url = {http://sigport.org/1281},
author = {Yemane Tedla and Kazuhide Yamamoto },
publisher = {IEEE SigPort},
title = {The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation},
year = {2016} }
TY - EJOUR
T1 - The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation
AU - Yemane Tedla and Kazuhide Yamamoto
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1281
ER -
Yemane Tedla and Kazuhide Yamamoto. (2016). The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation. IEEE SigPort. http://sigport.org/1281
Yemane Tedla and Kazuhide Yamamoto, 2016. The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation. Available at: http://sigport.org/1281.
Yemane Tedla and Kazuhide Yamamoto. (2016). "The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation." Web.
1. Yemane Tedla and Kazuhide Yamamoto. The Effect of Shallow Segmentation for English-Tigrinya Statistical Machine Translation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1281

Word Sense Implantation as Orthographical Conversion


We present a word sense disambiguation (WSD) tool of Japanese Hiragana words. Unlike other WSD tasks which output something like “sense #3” as result, our WSD task rewrites the target word into a Kanji word, which is a different orthography. This means that the task is also a kind of orthographical normalization as well as WSD. In this paper we present the task, our method, and the performance.

IALP-wsd.pdf

PDF icon IALP-wsd.pdf (143 downloads)

Paper Details

Authors:
Kazuhide Yamamoto and Yuki Mikami
Submitted On:
21 November 2016 - 8:27am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

IALP-wsd.pdf

(143 downloads)

Keywords

Additional Categories

Subscribe

[1] Kazuhide Yamamoto and Yuki Mikami, "Word Sense Implantation as Orthographical Conversion", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1280. Accessed: Dec. 16, 2017.
@article{1280-16,
url = {http://sigport.org/1280},
author = {Kazuhide Yamamoto and Yuki Mikami },
publisher = {IEEE SigPort},
title = {Word Sense Implantation as Orthographical Conversion},
year = {2016} }
TY - EJOUR
T1 - Word Sense Implantation as Orthographical Conversion
AU - Kazuhide Yamamoto and Yuki Mikami
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1280
ER -
Kazuhide Yamamoto and Yuki Mikami. (2016). Word Sense Implantation as Orthographical Conversion. IEEE SigPort. http://sigport.org/1280
Kazuhide Yamamoto and Yuki Mikami, 2016. Word Sense Implantation as Orthographical Conversion. Available at: http://sigport.org/1280.
Kazuhide Yamamoto and Yuki Mikami. (2016). "Word Sense Implantation as Orthographical Conversion." Web.
1. Kazuhide Yamamoto and Yuki Mikami. Word Sense Implantation as Orthographical Conversion [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1280

Detecting Representative Web Articles Using Heterogeneous Graphs

Paper Details

Authors:
Submitted On:
21 November 2016 - 3:13am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slide.11.21_upload.pptx

(128 downloads)

Keywords

Additional Categories

Subscribe

[1] , "Detecting Representative Web Articles Using Heterogeneous Graphs", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1279. Accessed: Dec. 16, 2017.
@article{1279-16,
url = {http://sigport.org/1279},
author = { },
publisher = {IEEE SigPort},
title = {Detecting Representative Web Articles Using Heterogeneous Graphs},
year = {2016} }
TY - EJOUR
T1 - Detecting Representative Web Articles Using Heterogeneous Graphs
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1279
ER -
. (2016). Detecting Representative Web Articles Using Heterogeneous Graphs. IEEE SigPort. http://sigport.org/1279
, 2016. Detecting Representative Web Articles Using Heterogeneous Graphs. Available at: http://sigport.org/1279.
. (2016). "Detecting Representative Web Articles Using Heterogeneous Graphs." Web.
1. . Detecting Representative Web Articles Using Heterogeneous Graphs [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1279

Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus


Uyghur is minority language in China, it is one of the official languages in Xinjiang Uyghur Autonomous Region of China. More than 10 million people use Uyghur in their daily life and even on the Internet. However, lack of Uyghur entity relation corpus constrains relation extraction applications in Uyghur. In this paper, we describe annotation schemes for creating annotated corpus for Uyghur named entity and Uyghur named entity relation.

Paper Details

Authors:
Submitted On:
20 November 2016 - 3:27am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus

(86 downloads)

Keywords

Subscribe

[1] , "Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1277. Accessed: Dec. 16, 2017.
@article{1277-16,
url = {http://sigport.org/1277},
author = { },
publisher = {IEEE SigPort},
title = {Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus},
year = {2016} }
TY - EJOUR
T1 - Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1277
ER -
. (2016). Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus. IEEE SigPort. http://sigport.org/1277
, 2016. Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus. Available at: http://sigport.org/1277.
. (2016). "Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus." Web.
1. . Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1277

Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank

Paper Details

Authors:
Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song
Submitted On:
20 November 2016 - 2:21am
Short Link:
Type:
Event:

Document Files

Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank.pdf

(90 downloads)

Keywords

Subscribe

[1] Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song, "Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1276. Accessed: Dec. 16, 2017.
@article{1276-16,
url = {http://sigport.org/1276},
author = {Shuqin Zhu;Yinxia Zhang;Weiming Peng;Jihua Song },
publisher = {IEEE SigPort},
title = {Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank},
year = {2016} }
TY - EJOUR
T1 - Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank
AU - Shuqin Zhu;Yinxia Zhang;Weiming Peng;Jihua Song
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1276
ER -
Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song. (2016). Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank. IEEE SigPort. http://sigport.org/1276
Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song, 2016. Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank. Available at: http://sigport.org/1276.
Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song. (2016). "Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank." Web.
1. Shuqin Zhu,Yinxia Zhang,Weiming Peng,Jihua Song. Construction of the Basic Sentence-pattern Instance Database Based on the International Chinese Textbook Treebank [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1276

Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation


We have investigated the effect of normalizing Japanese orthographical variants into a uniform orthography on statistical machine translation (SMT) between Japanese and English. In Japanese, 10% of words have reportedly more than one orthographical variants, which is a promising fact for improving translation quality when we normalize these orthographical variants.

Paper Details

Authors:
Kazuhide Yamamoto, Kanji Takahashi
Submitted On:
21 November 2016 - 8:28pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

15-IALP2016.pdf

(126 downloads)

Keywords

Additional Categories

Subscribe

[1] Kazuhide Yamamoto, Kanji Takahashi, "Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1273. Accessed: Dec. 16, 2017.
@article{1273-16,
url = {http://sigport.org/1273},
author = {Kazuhide Yamamoto; Kanji Takahashi },
publisher = {IEEE SigPort},
title = {Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation},
year = {2016} }
TY - EJOUR
T1 - Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation
AU - Kazuhide Yamamoto; Kanji Takahashi
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1273
ER -
Kazuhide Yamamoto, Kanji Takahashi. (2016). Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation. IEEE SigPort. http://sigport.org/1273
Kazuhide Yamamoto, Kanji Takahashi, 2016. Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation. Available at: http://sigport.org/1273.
Kazuhide Yamamoto, Kanji Takahashi. (2016). "Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation." Web.
1. Kazuhide Yamamoto, Kanji Takahashi. Japanese Orthographical Normalization Does Not Work for Statistical Machine Translation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1273

Fundamental Tools and Resource are Available for Vietnamese Analysis


This paper presents our work on developing Vietnamese fundamental tools and a resource for analysis. These tools are for word segmentation and part-of-speech tagging, diacritics restoration, and orthographical variants dictionary. All of them have been either not publicly available so far or not attaining sufficient performance. We have developed the tools and released the tools to the public, in both software packages and web tools. For development, we utilize state-of-the-art methods and achieved high accuracy.

Paper Details

Authors:
Kanji Takahash, Kazuhide Yamamoto
Submitted On:
21 November 2016 - 4:25am
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

68-IALP2016.pdf

(122 downloads)

Keywords

Additional Categories

Subscribe

[1] Kanji Takahash, Kazuhide Yamamoto, "Fundamental Tools and Resource are Available for Vietnamese Analysis", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1271. Accessed: Dec. 16, 2017.
@article{1271-16,
url = {http://sigport.org/1271},
author = {Kanji Takahash; Kazuhide Yamamoto },
publisher = {IEEE SigPort},
title = {Fundamental Tools and Resource are Available for Vietnamese Analysis},
year = {2016} }
TY - EJOUR
T1 - Fundamental Tools and Resource are Available for Vietnamese Analysis
AU - Kanji Takahash; Kazuhide Yamamoto
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1271
ER -
Kanji Takahash, Kazuhide Yamamoto. (2016). Fundamental Tools and Resource are Available for Vietnamese Analysis. IEEE SigPort. http://sigport.org/1271
Kanji Takahash, Kazuhide Yamamoto, 2016. Fundamental Tools and Resource are Available for Vietnamese Analysis. Available at: http://sigport.org/1271.
Kanji Takahash, Kazuhide Yamamoto. (2016). "Fundamental Tools and Resource are Available for Vietnamese Analysis." Web.
1. Kanji Takahash, Kazuhide Yamamoto. Fundamental Tools and Resource are Available for Vietnamese Analysis [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1271

Atypicality for Vector Gaussian Models

Paper Details

Authors:
Elyas Sabeti
Submitted On:
23 February 2016 - 1:44pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

GlobalSIP 2015.pdf

(311 downloads)

Keywords

Subscribe

[1] Elyas Sabeti, "Atypicality for Vector Gaussian Models", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/494. Accessed: Dec. 16, 2017.
@article{494-15,
url = {http://sigport.org/494},
author = {Elyas Sabeti },
publisher = {IEEE SigPort},
title = {Atypicality for Vector Gaussian Models},
year = {2015} }
TY - EJOUR
T1 - Atypicality for Vector Gaussian Models
AU - Elyas Sabeti
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/494
ER -
Elyas Sabeti. (2015). Atypicality for Vector Gaussian Models. IEEE SigPort. http://sigport.org/494
Elyas Sabeti, 2015. Atypicality for Vector Gaussian Models. Available at: http://sigport.org/494.
Elyas Sabeti. (2015). "Atypicality for Vector Gaussian Models." Web.
1. Elyas Sabeti. Atypicality for Vector Gaussian Models [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/494

CRH: A Simple Benchmark Approach to Continuous Hashing


gsip_mc.pdf

PDF icon gsip_mc.pdf (385 downloads)

gsip_mc.pdf

PDF icon gsip_mc.pdf (262 downloads)

Paper Details

Authors:
Submitted On:
23 February 2016 - 1:43pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

gsip_mc.pdf

(385 downloads)

gsip_mc.pdf

(262 downloads)

Keywords

Subscribe

[1] , "CRH: A Simple Benchmark Approach to Continuous Hashing", IEEE SigPort, 2015. [Online]. Available: http://sigport.org/248. Accessed: Dec. 16, 2017.
@article{248-15,
url = {http://sigport.org/248},
author = { },
publisher = {IEEE SigPort},
title = {CRH: A Simple Benchmark Approach to Continuous Hashing},
year = {2015} }
TY - EJOUR
T1 - CRH: A Simple Benchmark Approach to Continuous Hashing
AU -
PY - 2015
PB - IEEE SigPort
UR - http://sigport.org/248
ER -
. (2015). CRH: A Simple Benchmark Approach to Continuous Hashing. IEEE SigPort. http://sigport.org/248
, 2015. CRH: A Simple Benchmark Approach to Continuous Hashing. Available at: http://sigport.org/248.
. (2015). "CRH: A Simple Benchmark Approach to Continuous Hashing." Web.
1. . CRH: A Simple Benchmark Approach to Continuous Hashing [Internet]. IEEE SigPort; 2015. Available from : http://sigport.org/248

Pages