Sorry, you need to enable JavaScript to visit this website.

Knowledge and Data Engineering

Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation


Keyphrases are short phrases that best represent a document content. They can be useful in a variety of applications, including document summarization and retrieval models. In this paper, we introduce the first dataset of keyphrases for an Arabic document collection, obtained by means of crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies and validate their performances against expert annotations to evaluate the quality of our dataset. We report about our experimental results, the dataset features, some lessons learned, and ideas for future

Paper Details

Authors:
Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini
Submitted On:
30 November 2016 - 4:11am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation

(25 downloads)

Keywords

Additional Categories

Subscribe

[1] Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini, "Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1321. Accessed: Feb. 25, 2017.
@article{1321-16,
url = {http://sigport.org/1321},
author = {Marco Basaldella; Eddy Maddalena; Stefano Mizzaro; Gianluca Demartini },
publisher = {IEEE SigPort},
title = {Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation},
year = {2016} }
TY - EJOUR
T1 - Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation
AU - Marco Basaldella; Eddy Maddalena; Stefano Mizzaro; Gianluca Demartini
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1321
ER -
Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini. (2016). Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation. IEEE SigPort. http://sigport.org/1321
Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini, 2016. Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation. Available at: http://sigport.org/1321.
Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini. (2016). "Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation." Web.
1. Marco Basaldella, Eddy Maddalena, Stefano Mizzaro, Gianluca Demartini. Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1321

Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction


Arabic is one of the fastest growing languages on the Web, with an increasing amount of user generated content being published by both native and non-native speakers all over the world. Despite the great linguistic differences between Arabic and western languages such as English, most Arabic keyphrase extraction systems rely on approaches designed for western languages, thus ignoring its rich morphology and syntax. In this paper we present a new approach leveraging the Arabic morphology and syntax to generate a restricted set of meaningful candidates among which keyphrases are selected.

Paper Details

Authors:
Dario De Nart, Dante Degl’Innocenti, Carlo Tasso
Submitted On:
30 November 2016 - 4:12am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction

(31 downloads)

Keywords

Additional Categories

Subscribe

[1] Dario De Nart, Dante Degl’Innocenti, Carlo Tasso, "Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1320. Accessed: Feb. 25, 2017.
@article{1320-16,
url = {http://sigport.org/1320},
author = {Dario De Nart; Dante Degl’Innocenti; Carlo Tasso },
publisher = {IEEE SigPort},
title = {Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction},
year = {2016} }
TY - EJOUR
T1 - Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction
AU - Dario De Nart; Dante Degl’Innocenti; Carlo Tasso
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1320
ER -
Dario De Nart, Dante Degl’Innocenti, Carlo Tasso. (2016). Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction. IEEE SigPort. http://sigport.org/1320
Dario De Nart, Dante Degl’Innocenti, Carlo Tasso, 2016. Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction. Available at: http://sigport.org/1320.
Dario De Nart, Dante Degl’Innocenti, Carlo Tasso. (2016). "Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction." Web.
1. Dario De Nart, Dante Degl’Innocenti, Carlo Tasso. Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1320

Annotating Chinese Noun Phrases Based on Semantic Dependency Graph


Annotating complicated noun phrases is a difficulty in semantic analysis. In this paper we investigate the annotation methods of noun phrases in Nombank, Chinese Nombank and Sinica Treebank trying to propose an annotation scheme based on semantic dependency graph for noun phrases.

Paper Details

Authors:
Shao Yanqiu
Submitted On:
29 November 2016 - 4:06am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Annotating Chinese Noun Phrases Based on Semantic Dependency Graph_IALP.pdf

(26 downloads)

Keywords

Subscribe

[1] Shao Yanqiu, "Annotating Chinese Noun Phrases Based on Semantic Dependency Graph", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1318. Accessed: Feb. 25, 2017.
@article{1318-16,
url = {http://sigport.org/1318},
author = {Shao Yanqiu },
publisher = {IEEE SigPort},
title = {Annotating Chinese Noun Phrases Based on Semantic Dependency Graph},
year = {2016} }
TY - EJOUR
T1 - Annotating Chinese Noun Phrases Based on Semantic Dependency Graph
AU - Shao Yanqiu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1318
ER -
Shao Yanqiu. (2016). Annotating Chinese Noun Phrases Based on Semantic Dependency Graph. IEEE SigPort. http://sigport.org/1318
Shao Yanqiu, 2016. Annotating Chinese Noun Phrases Based on Semantic Dependency Graph. Available at: http://sigport.org/1318.
Shao Yanqiu. (2016). "Annotating Chinese Noun Phrases Based on Semantic Dependency Graph." Web.
1. Shao Yanqiu. Annotating Chinese Noun Phrases Based on Semantic Dependency Graph [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1318

Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models


This work focuses on two specific types of sentimental information analysis for traditional Chinese words, i.e., valence represents the degree of pleasant and unpleasant feelings (i.e., sentiment orientation), and arousal represents the degree of excitement and calm (i.e., sentiment strength). To address it, we proposed supervised ensemble learning models to assign appropriate real valued ratings to each

Paper Details

Authors:
Feixiang Wang, Yunxiao Zhou, Lan man
Submitted On:
27 November 2016 - 11:06pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

IALP-117-slides

(28 downloads)

Keywords

Subscribe

[1] Feixiang Wang, Yunxiao Zhou, Lan man, "Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1314. Accessed: Feb. 25, 2017.
@article{1314-16,
url = {http://sigport.org/1314},
author = {Feixiang Wang; Yunxiao Zhou; Lan man },
publisher = {IEEE SigPort},
title = {Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models},
year = {2016} }
TY - EJOUR
T1 - Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models
AU - Feixiang Wang; Yunxiao Zhou; Lan man
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1314
ER -
Feixiang Wang, Yunxiao Zhou, Lan man. (2016). Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models. IEEE SigPort. http://sigport.org/1314
Feixiang Wang, Yunxiao Zhou, Lan man, 2016. Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models. Available at: http://sigport.org/1314.
Feixiang Wang, Yunxiao Zhou, Lan man. (2016). "Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models." Web.
1. Feixiang Wang, Yunxiao Zhou, Lan man. Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1314

Importance Weighted Feature Selection Strategy for Text Classification


Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al.

Paper Details

Authors:
Baoli Li
Submitted On:
27 November 2016 - 10:44am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

IALP2016-113-baoli-v0.2.pdf

(31 downloads)

Keywords

Subscribe

[1] Baoli Li, "Importance Weighted Feature Selection Strategy for Text Classification", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1312. Accessed: Feb. 25, 2017.
@article{1312-16,
url = {http://sigport.org/1312},
author = {Baoli Li },
publisher = {IEEE SigPort},
title = {Importance Weighted Feature Selection Strategy for Text Classification},
year = {2016} }
TY - EJOUR
T1 - Importance Weighted Feature Selection Strategy for Text Classification
AU - Baoli Li
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1312
ER -
Baoli Li. (2016). Importance Weighted Feature Selection Strategy for Text Classification. IEEE SigPort. http://sigport.org/1312
Baoli Li, 2016. Importance Weighted Feature Selection Strategy for Text Classification. Available at: http://sigport.org/1312.
Baoli Li. (2016). "Importance Weighted Feature Selection Strategy for Text Classification." Web.
1. Baoli Li. Importance Weighted Feature Selection Strategy for Text Classification [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1312

Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression


Dimensional sentiment analysis approach, which represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space, allows for more fine-grained analysis than the traditional categorical approach. In recent years, it has been applied in applications such as antisocial behavior detection, mood analysis and product review ranking. In this approach, an affective lexicon with dimensional sentiment values is a key resource, but building such a lexicon costs much.

Paper Details

Authors:
Submitted On:
27 November 2016 - 10:39am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

IALP2016_124_Baoli_poster1.pdf

(26 downloads)

Keywords

Subscribe

[1] , "Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression ", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1311. Accessed: Feb. 25, 2017.
@article{1311-16,
url = {http://sigport.org/1311},
author = { },
publisher = {IEEE SigPort},
title = {Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression },
year = {2016} }
TY - EJOUR
T1 - Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1311
ER -
. (2016). Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression . IEEE SigPort. http://sigport.org/1311
, 2016. Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression . Available at: http://sigport.org/1311.
. (2016). "Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression ." Web.
1. . Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1311

Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers


In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination.

Poster.pdf

PDF icon Poster.pdf (255 downloads)

Paper Details

Authors:
Maytham Alabbas, Raidah S. Khudeyer
Submitted On:
25 November 2016 - 3:49am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster.pdf

(255 downloads)

Keywords

Subscribe

[1] Maytham Alabbas, Raidah S. Khudeyer, "Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1307. Accessed: Feb. 25, 2017.
@article{1307-16,
url = {http://sigport.org/1307},
author = {Maytham Alabbas; Raidah S. Khudeyer },
publisher = {IEEE SigPort},
title = {Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers},
year = {2016} }
TY - EJOUR
T1 - Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers
AU - Maytham Alabbas; Raidah S. Khudeyer
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1307
ER -
Maytham Alabbas, Raidah S. Khudeyer. (2016). Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers. IEEE SigPort. http://sigport.org/1307
Maytham Alabbas, Raidah S. Khudeyer, 2016. Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers. Available at: http://sigport.org/1307.
Maytham Alabbas, Raidah S. Khudeyer. (2016). "Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers." Web.
1. Maytham Alabbas, Raidah S. Khudeyer. Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1307

A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters


In this study, we outline a potential problem
in normalising texts that are based on a modified version
of the Arabic alphabet. One of the main resources
available for processing resource-scarce languages is
raw text collected from the Internet. Many less-
resourced languages, such as Kurdish, Farsi, Urdu,
Pashtu, etc., use a modified version of the Arabic writing
system. Many characters in harvested data from the
Internet may have exactly the same form but encoded
with different Unicode values (ambiguous characters).

Paper Details

Authors:
Submitted On:
25 November 2016 - 3:44am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

sarar_jaf_poster.pdf

(25 downloads)

Keywords

Subscribe

[1] , "A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1306. Accessed: Feb. 25, 2017.
@article{1306-16,
url = {http://sigport.org/1306},
author = { },
publisher = {IEEE SigPort},
title = {A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters},
year = {2016} }
TY - EJOUR
T1 - A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1306
ER -
. (2016). A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters. IEEE SigPort. http://sigport.org/1306
, 2016. A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters. Available at: http://sigport.org/1306.
. (2016). "A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters." Web.
1. . A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1306

Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words


Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

Paper Details

Authors:
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda
Submitted On:
22 November 2016 - 11:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cross-language record linkage

(32 downloads)

Keywords

Subscribe

[1] Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda, "Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1295. Accessed: Feb. 25, 2017.
@article{1295-16,
url = {http://sigport.org/1295},
author = {Taisuke Kimura; Biligsaikhan Batjargal; Akira Maeda },
publisher = {IEEE SigPort},
title = {Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words},
year = {2016} }
TY - EJOUR
T1 - Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words
AU - Taisuke Kimura; Biligsaikhan Batjargal; Akira Maeda
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1295
ER -
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. (2016). Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words. IEEE SigPort. http://sigport.org/1295
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda, 2016. Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words. Available at: http://sigport.org/1295.
Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. (2016). "Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words." Web.
1. Taisuke Kimura, Biligsaikhan Batjargal, Akira Maeda. Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1295

Improving the Effectiveness of POI Search by Associated Information Summarization


The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

Paper Details

Authors:
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng
Submitted On:
22 November 2016 - 9:56am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

IALP_Text Mining Session

(43 downloads)

Keywords

Additional Categories

Subscribe

[1] Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng, "Improving the Effectiveness of POI Search by Associated Information Summarization", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1294. Accessed: Feb. 25, 2017.
@article{1294-16,
url = {http://sigport.org/1294},
author = {Hsiu-Min Chuang; Chia-Hui Chang; Chung-Ting Cheng },
publisher = {IEEE SigPort},
title = {Improving the Effectiveness of POI Search by Associated Information Summarization},
year = {2016} }
TY - EJOUR
T1 - Improving the Effectiveness of POI Search by Associated Information Summarization
AU - Hsiu-Min Chuang; Chia-Hui Chang; Chung-Ting Cheng
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1294
ER -
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. (2016). Improving the Effectiveness of POI Search by Associated Information Summarization. IEEE SigPort. http://sigport.org/1294
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng, 2016. Improving the Effectiveness of POI Search by Associated Information Summarization. Available at: http://sigport.org/1294.
Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. (2016). "Improving the Effectiveness of POI Search by Associated Information Summarization." Web.
1. Hsiu-Min Chuang, Chia-Hui Chang, Chung-Ting Cheng. Improving the Effectiveness of POI Search by Associated Information Summarization [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1294

Pages