Sorry, you need to enable JavaScript to visit this website.

IALP 2016

Welcome to IASL 2016 - November 21-23, 2016, Tainan, Taiwan

The International Conference on Asian Language Processing (IALP) is a series of conferences with unique focus on Asian Language Processing. The conference aims to advance the science and technology of all the aspects of Asian Language Processing by providing a forum for researchers in the different fields of language study all over the world to meet.

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

Categories:
3 Views

The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

Categories:
9 Views

This paper describes a model to address the task of named-entity recognition on Indonesian microblog messages due to its usefulness for higher-level tasks or text mining applications on Indonesian microblogs. We view our task as a sequence labeling problem using machine learning approach. We also propose various word-level and orthographic features, including the ones that are specific to the Indonesian language. Finally, in our experiment, we compared our model with a baseline model previously proposed for Indonesian formal documents, instead of microblog messages.

Categories:
11 Views

The absence of manually annotated training data presents an obstacle for the development of machine-learning based NLP tools in Indonesia. Existing annotation tools lack a mobile-friendly interface which is a problem in Indonesia where most users access the internet using their smartphone. In this paper we propose the first mobile collaborative data annotation tool and evaluate it in an experiment involving 15 Indonesian students who annotated 1500 data records using their smartphones. Users confirmed

Categories:
62 Views

Semantic role labeling (SRL) is a task to as- sign semantic role labels to sentence elements. This pa- per describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word- to-Word feature approach outperforms the Phrase-to-Phrase approach. The application of the SRL system to an event extraction problem resulted overlap-based accuracy of 0.94 for the actor identification.

Categories:
6 Views

Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.

Categories:
12 Views

Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.

Categories:
16 Views

In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate wordbased and character-based language models. For characterbased language models, we look into the cases where the inter-word space is treated or not treated as a token.

Categories:
19 Views

It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the advantage of using recurrent neural network language models over n-gram language models will become more and more evident.

Categories:
15 Views

Pages