Sorry, you need to enable JavaScript to visit this website.

Knowledge and Data Engineering

Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers


In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination.

Poster.pdf

PDF icon Poster.pdf (744 downloads)

Paper Details

Authors:
Maytham Alabbas, Raidah S. Khudeyer
Submitted On:
25 November 2016 - 3:49am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster.pdf

(744 downloads)

Keywords

Subscribe

[1] Maytham Alabbas, Raidah S. Khudeyer, "Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1307. Accessed: Dec. 18, 2017.
@article{1307-16,
url = {http://sigport.org/1307},
author = {Maytham Alabbas; Raidah S. Khudeyer },
publisher = {IEEE SigPort},
title = {Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers},
year = {2016} }
TY - EJOUR
T1 - Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers
AU - Maytham Alabbas; Raidah S. Khudeyer
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1307
ER -
Maytham Alabbas, Raidah S. Khudeyer. (2016). Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers. IEEE SigPort. http://sigport.org/1307
Maytham Alabbas, Raidah S. Khudeyer, 2016. Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers. Available at: http://sigport.org/1307.
Maytham Alabbas, Raidah S. Khudeyer. (2016). "Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers." Web.
1. Maytham Alabbas, Raidah S. Khudeyer. Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1307

A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters


In this study, we outline a potential problem
in normalising texts that are based on a modified version
of the Arabic alphabet. One of the main resources
available for processing resource-scarce languages is
raw text collected from the Internet. Many less-
resourced languages, such as Kurdish, Farsi, Urdu,
Pashtu, etc., use a modified version of the Arabic writing
system. Many characters in harvested data from the
Internet may have exactly the same form but encoded
with different Unicode values (ambiguous characters).

Paper Details

Authors:
Submitted On:
25 November 2016 - 3:44am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

sarar_jaf_poster.pdf

(165 downloads)

Keywords

Subscribe

[1] , "A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1306. Accessed: Dec. 18, 2017.
@article{1306-16,
url = {http://sigport.org/1306},
author = { },
publisher = {IEEE SigPort},
title = {A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters},
year = {2016} }
TY - EJOUR
T1 - A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1306
ER -
. (2016). A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters. IEEE SigPort. http://sigport.org/1306
, 2016. A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters. Available at: http://sigport.org/1306.
. (2016). "A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters." Web.
1. . A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1306

Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words


Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

Paper Details

Authors:
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda
Submitted On:
22 November 2016 - 11:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cross-language record linkage

(154 downloads)

Keywords

Subscribe

[1] Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda, "Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1295. Accessed: Dec. 18, 2017.
@article{1295-16,
url = {http://sigport.org/1295},
author = {Taisuke Kimura; Biligsaikhan Batjargal; Akira Maeda },
publisher = {IEEE SigPort},
title = {Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words},
year = {2016} }
TY - EJOUR
T1 - Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words
AU - Taisuke Kimura; Biligsaikhan Batjargal; Akira Maeda
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1295
ER -
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. (2016). Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words. IEEE SigPort. http://sigport.org/1295
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda, 2016. Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words. Available at: http://sigport.org/1295.
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. (2016). "Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words." Web.
1. Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1295

Improving the Effectiveness of POI Search by Associated Information Summarization


The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

Paper Details

Authors:
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng
Submitted On:
22 November 2016 - 9:56am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

IALP_Text Mining Session

(188 downloads)

Keywords

Additional Categories

Subscribe

[1] Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng, "Improving the Effectiveness of POI Search by Associated Information Summarization", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1294. Accessed: Dec. 18, 2017.
@article{1294-16,
url = {http://sigport.org/1294},
author = {Hsiu-Min Chuang; Chia-Hui Chang; Chung-Ting Cheng },
publisher = {IEEE SigPort},
title = {Improving the Effectiveness of POI Search by Associated Information Summarization},
year = {2016} }
TY - EJOUR
T1 - Improving the Effectiveness of POI Search by Associated Information Summarization
AU - Hsiu-Min Chuang; Chia-Hui Chang; Chung-Ting Cheng
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1294
ER -
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. (2016). Improving the Effectiveness of POI Search by Associated Information Summarization. IEEE SigPort. http://sigport.org/1294
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng, 2016. Improving the Effectiveness of POI Search by Associated Information Summarization. Available at: http://sigport.org/1294.
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. (2016). "Improving the Effectiveness of POI Search by Associated Information Summarization." Web.
1. Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. Improving the Effectiveness of POI Search by Associated Information Summarization [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1294

Web Content Extraction Based on Maximum Continuous Sum of Text Density


Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.

Paper Details

Authors:
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu
Submitted On:
21 November 2016 - 9:34pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Kai Sun – IALP 2016.pptx

(0)

Keywords

Subscribe

[1] Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu, "Web Content Extraction Based on Maximum Continuous Sum of Text Density", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1288. Accessed: Dec. 18, 2017.
@article{1288-16,
url = {http://sigport.org/1288},
author = {Kai Sun; Miao Li; Jinhua Du; Lei Chen; Zhengxin Yang; Yi Gao; Sha Fu },
publisher = {IEEE SigPort},
title = {Web Content Extraction Based on Maximum Continuous Sum of Text Density},
year = {2016} }
TY - EJOUR
T1 - Web Content Extraction Based on Maximum Continuous Sum of Text Density
AU - Kai Sun; Miao Li; Jinhua Du; Lei Chen; Zhengxin Yang; Yi Gao; Sha Fu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1288
ER -
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. (2016). Web Content Extraction Based on Maximum Continuous Sum of Text Density. IEEE SigPort. http://sigport.org/1288
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu, 2016. Web Content Extraction Based on Maximum Continuous Sum of Text Density. Available at: http://sigport.org/1288.
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. (2016). "Web Content Extraction Based on Maximum Continuous Sum of Text Density." Web.
1. Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. Web Content Extraction Based on Maximum Continuous Sum of Text Density [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1288

Web Content Extraction Based on Maximum Continuous Sum of Text Density


Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.

Paper Details

Authors:
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu
Submitted On:
21 November 2016 - 9:34pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Kai Sun – IALP 2016.pptx

(0)

Keywords

Subscribe

[1] Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu, "Web Content Extraction Based on Maximum Continuous Sum of Text Density", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1287. Accessed: Dec. 18, 2017.
@article{1287-16,
url = {http://sigport.org/1287},
author = {Kai Sun; Miao Li; Jinhua Du; Lei Chen; Zhengxin Yang; Yi Gao; Sha Fu },
publisher = {IEEE SigPort},
title = {Web Content Extraction Based on Maximum Continuous Sum of Text Density},
year = {2016} }
TY - EJOUR
T1 - Web Content Extraction Based on Maximum Continuous Sum of Text Density
AU - Kai Sun; Miao Li; Jinhua Du; Lei Chen; Zhengxin Yang; Yi Gao; Sha Fu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1287
ER -
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. (2016). Web Content Extraction Based on Maximum Continuous Sum of Text Density. IEEE SigPort. http://sigport.org/1287
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu, 2016. Web Content Extraction Based on Maximum Continuous Sum of Text Density. Available at: http://sigport.org/1287.
Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. (2016). "Web Content Extraction Based on Maximum Continuous Sum of Text Density." Web.
1. Kai Sun, Miao Li, Jinhua Du, Lei Chen, Zhengxin Yang, Yi Gao, Sha Fu. Web Content Extraction Based on Maximum Continuous Sum of Text Density [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1287

Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks

Paper Details

Authors:
Steven Du , Xi Zhang
Submitted On:
21 November 2016 - 6:56pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

IALP2016-114-PDF

(153 downloads)

Keywords

Additional Categories

Subscribe

[1] Steven Du , Xi Zhang, "Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1286. Accessed: Dec. 18, 2017.
@article{1286-16,
url = {http://sigport.org/1286},
author = {Steven Du ; Xi Zhang },
publisher = {IEEE SigPort},
title = {Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks},
year = {2016} }
TY - EJOUR
T1 - Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks
AU - Steven Du ; Xi Zhang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1286
ER -
Steven Du , Xi Zhang. (2016). Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks. IEEE SigPort. http://sigport.org/1286
Steven Du , Xi Zhang, 2016. Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks. Available at: http://sigport.org/1286.
Steven Du , Xi Zhang. (2016). "Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks." Web.
1. Steven Du , Xi Zhang. Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1286

Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity


In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate wordbased and character-based language models. For characterbased language models, we look into the cases where the inter-word space is treated or not treated as a token.

40_RNN.pdf

PDF icon 40_RNN (135 downloads)

Paper Details

Authors:
Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen
Submitted On:
21 November 2016 - 10:21am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

40_RNN

(135 downloads)

Keywords

Additional Categories

Subscribe

[1] Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen, "Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1285. Accessed: Dec. 18, 2017.
@article{1285-16,
url = {http://sigport.org/1285},
author = {Tzu-Hsuan Yang; Tzu-Hsuan Tseng; Chia-Ping Chen },
publisher = {IEEE SigPort},
title = {Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity},
year = {2016} }
TY - EJOUR
T1 - Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity
AU - Tzu-Hsuan Yang; Tzu-Hsuan Tseng; Chia-Ping Chen
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1285
ER -
Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen. (2016). Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity. IEEE SigPort. http://sigport.org/1285
Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen, 2016. Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity. Available at: http://sigport.org/1285.
Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen. (2016). "Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity." Web.
1. Tzu-Hsuan Yang, Tzu-Hsuan Tseng, Chia-Ping Chen. Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1285

Verifying the Long-range Dependency of RNN Language Models


It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the advantage of using recurrent neural network language models over n-gram language models will become more and more evident.

Paper Details

Authors:
Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen
Submitted On:
21 November 2016 - 10:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

41_ngram_rnn

(118 downloads)

Keywords

Additional Categories

Subscribe

[1] Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen, "Verifying the Long-range Dependency of RNN Language Models", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1284. Accessed: Dec. 18, 2017.
@article{1284-16,
url = {http://sigport.org/1284},
author = {Tzu-Hsuan Tseng; Tzu-Hsuan Yang; Chia-Ping Chen },
publisher = {IEEE SigPort},
title = {Verifying the Long-range Dependency of RNN Language Models},
year = {2016} }
TY - EJOUR
T1 - Verifying the Long-range Dependency of RNN Language Models
AU - Tzu-Hsuan Tseng; Tzu-Hsuan Yang; Chia-Ping Chen
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1284
ER -
Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen. (2016). Verifying the Long-range Dependency of RNN Language Models. IEEE SigPort. http://sigport.org/1284
Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen, 2016. Verifying the Long-range Dependency of RNN Language Models. Available at: http://sigport.org/1284.
Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen. (2016). "Verifying the Long-range Dependency of RNN Language Models." Web.
1. Tzu-Hsuan Tseng, Tzu-Hsuan Yang, Chia-Ping Chen. Verifying the Long-range Dependency of RNN Language Models [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1284

History Question Classification and Representation for Chinese Gaokao


In this paper, we propose a question representation based on entity labeling and question classification for a automatic question answering system of Chinese Gaokao history question. A CRF model is used for the entity labeling and SVM/CNN/LSTM models are tested for question classification. Our experiments show that CRF model provides a high performance when used to label informative entities out while neural networks has a promising performance for the question classification task.

80.pdf

PDF icon 80.pdf (148 downloads)

Paper Details

Authors:
Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng
Submitted On:
21 November 2016 - 9:27pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

80.pdf

(148 downloads)

Keywords

Subscribe

[1] Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng, "History Question Classification and Representation for Chinese Gaokao", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1283. Accessed: Dec. 18, 2017.
@article{1283-16,
url = {http://sigport.org/1283},
author = {Ke Yu; Qiuzhi Liu; Yuqing Zheng; Tiejun Zhao; Dequan Zheng },
publisher = {IEEE SigPort},
title = {History Question Classification and Representation for Chinese Gaokao},
year = {2016} }
TY - EJOUR
T1 - History Question Classification and Representation for Chinese Gaokao
AU - Ke Yu; Qiuzhi Liu; Yuqing Zheng; Tiejun Zhao; Dequan Zheng
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1283
ER -
Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng. (2016). History Question Classification and Representation for Chinese Gaokao. IEEE SigPort. http://sigport.org/1283
Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng, 2016. History Question Classification and Representation for Chinese Gaokao. Available at: http://sigport.org/1283.
Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng. (2016). "History Question Classification and Representation for Chinese Gaokao." Web.
1. Ke Yu, Qiuzhi Liu, Yuqing Zheng, Tiejun Zhao, Dequan Zheng. History Question Classification and Representation for Chinese Gaokao [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1283

Pages