IALP 2016

Welcome to IASL 2016 - November 21-23, 2016, Tainan, Taiwan

The International Conference on Asian Language Processing (IALP) is a series of conferences with unique focus on Asian Language Processing. The conference aims to advance the science and technology of all the aspects of Asian Language Processing by providing a forum for researchers in the different fields of language study all over the world to meet.

Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.

IALP-Oral-SONG-submitted.pdf

cross-language record linkage (360)

Categories:: Knowledge and Data Engineering

5 Views

Improving the Effectiveness of POI Search by Associated Information Summarization

Read more about Improving the Effectiveness of POI Search by Associated Information Summarization
Log in to post comments

The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.

IALP_107_1122.pdf

IALP_Text Mining Session (943)

Categories:: Knowledge and Data Engineering

11 Views

Named Entity Recognition on Indonesian Microblog Messages

Read more about Named Entity Recognition on Indonesian Microblog Messages
Log in to post comments

This paper describes a model to address the task of named-entity recognition on Indonesian microblog messages due to its usefulness for higher-level tasks or text mining applications on Indonesian microblogs. We view our task as a sequence labeling problem using machine learning approach. We also propose various word-level and orthographic features, including the ones that are specific to the Indonesian language. Finally, in our experiment, we compared our model with a baseline model previously proposed for Indonesian formal documents, instead of microblog messages.

IALP2016 - Named Entity Recognition on Indonesian Microblog Messages.pdf

IALP2016 - Named Entity Recognition on Indonesian Microblog Messages.pdf (69)

Categories:: Emerging: Big Data

17 Views

Gotong Royong in NLP Research: A Mobile Tool for Collaborative Text Annotation in Indonesia

The absence of manually annotated training data presents an obstacle for the development of machine-learning based NLP tools in Indonesia. Existing annotation tools lack a mobile-friendly interface which is a problem in Indonesia where most users access the internet using their smartphone. In this paper we propose the ﬁrst mobile collaborative data annotation tool and evaluate it in an experiment involving 15 Indonesian students who annotated 1500 data records using their smartphones. Users conﬁrmed

presentation_IALP2016_46.pdf

presentation_IALP2016_46.pdf (398)

Categories:: Audio and Acoustic Signal Processing

70 Views

An Initial Study of Indonesian Semantic Role Labelingand Its Application on Event Extraction

Semantic role labeling (SRL) is a task to as- sign semantic role labels to sentence elements. This pa- per describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word- to-Word feature approach outperforms the Phrase-to-Phrase approach. The application of the SRL system to an event extraction problem resulted overlap-based accuracy of 0.94 for the actor identification.

presentation_IALP2016_Ade.pdf

presentation_IALP2016_Ade.pdf (846)

Categories:: Speech Processing

12 Views

Web Content Extraction Based on Maximum Continuous Sum of Text Density

Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
Log in to post comments

Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.

Kai Sun – IALP 2016.pptx

Kai Sun – IALP 2016.pptx (66)

Categories:: Knowledge and Data Engineering

15 Views

Web Content Extraction Based on Maximum Continuous Sum of Text Density

Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
Log in to post comments

Kai Sun – IALP 2016.pptx

Kai Sun – IALP 2016.pptx (70)

Categories:: Knowledge and Data Engineering

19 Views

Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks

ialp_ppt3.pdf

IALP2016-114-PDF (327)

Categories:: Knowledge and Data Engineering

20 Views

Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity

In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate wordbased and character-based language models. For characterbased language models, we look into the cases where the inter-word space is treated or not treated as a token.

40_RNN.pdf

40_RNN (767)

Categories:: Knowledge and Data Engineering

21 Views

Verifying the Long-range Dependency of RNN Language Models

Read more about Verifying the Long-range Dependency of RNN Language Models
Log in to post comments

It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the advantage of using recurrent neural network language models over n-gram language models will become more and more evident.

long_range_dependency_RNN.pdf

41_ngram_rnn (773)

Categories:: Knowledge and Data Engineering

21 Views

Pages