Knowledge and Data Engineering

Annotating Chinese Noun Phrases Based on Semantic Dependency Graph

Read more about Annotating Chinese Noun Phrases Based on Semantic Dependency Graph
Log in to post comments

Annotating complicated noun phrases is a difficulty in semantic analysis. In this paper we investigate the annotation methods of noun phrases in Nombank, Chinese Nombank and Sinica Treebank trying to propose an annotation scheme based on semantic dependency graph for noun phrases.

Annotating Chinese Noun Phrases Based on Semantic Dependency Graph_IALP.pdf

Annotating Chinese Noun Phrases Based on Semantic Dependency Graph_IALP.pdf (87)

Categories:: Knowledge and Data Engineering

5 Views

Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models

This work focuses on two specific types of sentimental information analysis for traditional Chinese words, i.e., valence represents the degree of pleasant and unpleasant feelings (i.e., sentiment orientation), and arousal represents the degree of excitement and calm (i.e., sentiment strength). To address it, we proposed supervised ensemble learning models to assign appropriate real valued ratings to each

117_slides.pdf

IALP-117-slides (979)

Categories:: Knowledge and Data Engineering

14 Views

Importance Weighted Feature Selection Strategy for Text Classification

Read more about Importance Weighted Feature Selection Strategy for Text Classification
Log in to post comments

Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al.

IALP2016-113-baoli-v0.2.pdf

IALP2016-113-baoli-v0.2.pdf (476)

Categories:: Knowledge and Data Engineering

7 Views

Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression

Dimensional sentiment analysis approach, which represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space, allows for more fine-grained analysis than the traditional categorical approach. In recent years, it has been applied in applications such as antisocial behavior detection, mood analysis and product review ranking. In this approach, an affective lexicon with dimensional sentiment values is a key resource, but building such a lexicon costs much.

IALP2016_124_Baoli_poster1.pdf

IALP2016_124_Baoli_poster1.pdf (1055)

Categories:: Knowledge and Data Engineering

9 Views

Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers

In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination.

Poster.pdf

Poster.pdf (1611)

Categories:: Knowledge and Data Engineering

7 Views

A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters

In this study, we outline a potential problem
in normalising texts that are based on a modified version
of the Arabic alphabet. One of the main resources
available for processing resource-scarce languages is
raw text collected from the Internet. Many less-
resourced languages, such as Kurdish, Farsi, Urdu,
Pashtu, etc., use a modified version of the Arabic writing
system. Many characters in harvested data from the
Internet may have exactly the same form but encoded
with different Unicode values (ambiguous characters).

sarar_jaf_poster.pdf

sarar_jaf_poster.pdf (884)

Categories:: Knowledge and Data Engineering

6 Views

Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

IALP-Oral-SONG-submitted.pdf

cross-language record linkage (360)

Categories:: Knowledge and Data Engineering

4 Views

Improving the Effectiveness of POI Search by Associated Information Summarization

Read more about Improving the Effectiveness of POI Search by Associated Information Summarization
Log in to post comments

The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

IALP_107_1122.pdf

IALP_Text Mining Session (924)

Categories:: Knowledge and Data Engineering

11 Views

Web Content Extraction Based on Maximum Continuous Sum of Text Density

Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
Log in to post comments

Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.