- Read more about Annotating Chinese Noun Phrases Based on Semantic Dependency Graph
- Log in to post comments
Annotating complicated noun phrases is a difficulty in semantic analysis. In this paper we investigate the annotation methods of noun phrases in Nombank, Chinese Nombank and Sinica Treebank trying to propose an annotation scheme based on semantic dependency graph for noun phrases.
- Categories:
- Read more about Dimensional Sentiment Analysis of Traditional Chinese Words Using Pre-trained Not-quite-right Sentiment Word Vectors and Supervised Ensemble Models
- Log in to post comments
This work focuses on two specific types of sentimental information analysis for traditional Chinese words, i.e., valence represents the degree of pleasant and unpleasant feelings (i.e., sentiment orientation), and arousal represents the degree of excitement and calm (i.e., sentiment strength). To address it, we proposed supervised ensemble learning models to assign appropriate real valued ratings to each
- Categories:
- Read more about Importance Weighted Feature Selection Strategy for Text Classification
- Log in to post comments
Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al.
- Categories:
- Read more about Learning Dimensional Sentiment of Traditional Chinese Words with Word Embedding and Support Vector Regression
- Log in to post comments
Dimensional sentiment analysis approach, which represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space, allows for more fine-grained analysis than the traditional categorical approach. In recent years, it has been applied in applications such as antisocial behavior detection, mood analysis and product review ranking. In this approach, an affective lexicon with dimensional sentiment values is a key resource, but building such a lexicon costs much.
- Categories:
- Read more about Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers
- Log in to post comments
In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination.
Poster.pdf
- Categories:
- Read more about A Semi-automatic Approach to Identifying and Unifying Ambiguously Encoded Arabic-Based Characters
- Log in to post comments
In this study, we outline a potential problem
in normalising texts that are based on a modified version
of the Arabic alphabet. One of the main resources
available for processing resource-scarce languages is
raw text collected from the Internet. Many less-
resourced languages, such as Kurdish, Farsi, Urdu,
Pashtu, etc., use a modified version of the Arabic writing
system. Many characters in harvested data from the
Internet may have exactly the same form but encoded
with different Unicode values (ambiguous characters).
- Categories:
- Read more about Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words
- Log in to post comments
Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.
- Categories:
- Read more about Improving the Effectiveness of POI Search by Associated Information Summarization
- Log in to post comments
The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.
- Categories:
- Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
- Log in to post comments
Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.
- Categories:
- Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
- Log in to post comments
Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.
- Categories: