Sorry, you need to enable JavaScript to visit this website.

Keyphrases are short phrases that best represent a document content. They can be useful in a variety of applications, including document summarization and retrieval models. In this paper, we introduce the first dataset of keyphrases for an Arabic document collection, obtained by means of crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies and validate their performances against expert annotations to evaluate the quality of our dataset. We report about our experimental results, the dataset features, some lessons learned, and ideas for future

Categories:
2 Views

Arabic is one of the fastest growing languages on the Web, with an increasing amount of user generated content being published by both native and non-native speakers all over the world. Despite the great linguistic differences between Arabic and western languages such as English, most Arabic keyphrase extraction systems rely on approaches designed for western languages, thus ignoring its rich morphology and syntax. In this paper we present a new approach leveraging the Arabic morphology and syntax to generate a restricted set of meaningful candidates among which keyphrases are selected.

Categories:
2 Views

Annotating complicated noun phrases is a difficulty in semantic analysis. In this paper we investigate the annotation methods of noun phrases in Nombank, Chinese Nombank and Sinica Treebank trying to propose an annotation scheme based on semantic dependency graph for noun phrases.

Categories:
3 Views

This work focuses on two specific types of sentimental information analysis for traditional Chinese words, i.e., valence represents the degree of pleasant and unpleasant feelings (i.e., sentiment orientation), and arousal represents the degree of excitement and calm (i.e., sentiment strength). To address it, we proposed supervised ensemble learning models to assign appropriate real valued ratings to each

Categories:
9 Views

Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al.

Categories:
2 Views

Dimensional sentiment analysis approach, which represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space, allows for more fine-grained analysis than the traditional categorical approach. In recent years, it has been applied in applications such as antisocial behavior detection, mood analysis and product review ranking. In this approach, an affective lexicon with dimensional sentiment values is a key resource, but building such a lexicon costs much.

Categories:
5 Views

In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination.

Categories:
6 Views

In this study, we outline a potential problem
in normalising texts that are based on a modified version
of the Arabic alphabet. One of the main resources
available for processing resource-scarce languages is
raw text collected from the Internet. Many less-
resourced languages, such as Kurdish, Farsi, Urdu,
Pashtu, etc., use a modified version of the Arabic writing
system. Many characters in harvested data from the
Internet may have exactly the same form but encoded
with different Unicode values (ambiguous characters).

Categories:
1 Views

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

Categories:
2 Views

The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

Categories:
3 Views

Pages