Welcome to IASL 2016 - November 21-23, 2016, Tainan, Taiwan
The International Conference on Asian Language Processing (IALP) is a series of conferences with unique focus on Asian Language Processing. The conference aims to advance the science and technology of all the aspects of Asian Language Processing by providing a forum for researchers in the different fields of language study all over the world to meet.
- Read more about Proper Noun Recognition in Cross-Language Record Linkage by Exploiting Transliterated Words
- Log in to post comments
Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. In order to improve the accuracy of proper noun recognition, we propose a back-transliteration method, in which transliterated words in target language are back-transliterated to their original words in source language. The acquired words and their transliterations are employed to recognize and transliterate proper nouns in metadata.
- Categories:
- Read more about Improving the Effectiveness of POI Search by Associated Information Summarization
- Log in to post comments
The demand for map services has risen significantly in recent years due to the popularity of mobile devices and wireless networks. Since there are always emerging point-of-interest (POI) in the real world, mining POIs shared by users from the Web has been a challenging problem to enrich existing POI database. However, crawling address-bearing pages and extracting POI relations are only the fundamentals for constructing POI database, the description of POIs, i.e. the services and products that POIs provide are especially essential for POI search.
- Categories:
This paper describes a model to address the task of named-entity recognition on Indonesian microblog messages due to its usefulness for higher-level tasks or text mining applications on Indonesian microblogs. We view our task as a sequence labeling problem using machine learning approach. We also propose various word-level and orthographic features, including the ones that are specific to the Indonesian language. Finally, in our experiment, we compared our model with a baseline model previously proposed for Indonesian formal documents, instead of microblog messages.
- Categories:
- Read more about Gotong Royong in NLP Research: A Mobile Tool for Collaborative Text Annotation in Indonesia
- Log in to post comments
The absence of manually annotated training data presents an obstacle for the development of machine-learning based NLP tools in Indonesia. Existing annotation tools lack a mobile-friendly interface which is a problem in Indonesia where most users access the internet using their smartphone. In this paper we propose the first mobile collaborative data annotation tool and evaluate it in an experiment involving 15 Indonesian students who annotated 1500 data records using their smartphones. Users confirmed
- Categories:
- Read more about An Initial Study of Indonesian Semantic Role Labelingand Its Application on Event Extraction
- Log in to post comments
Semantic role labeling (SRL) is a task to as- sign semantic role labels to sentence elements. This pa- per describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word- to-Word feature approach outperforms the Phrase-to-Phrase approach. The application of the SRL system to an event extraction problem resulted overlap-based accuracy of 0.94 for the actor identification.
- Categories:
- Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
- Log in to post comments
Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.
- Categories:
- Read more about Web Content Extraction Based on Maximum Continuous Sum of Text Density
- Log in to post comments
Generally different websites have different web page structures, which would heavily affect the extraction quality when the web content is automatically collected. The maximum continuous sum of text density (MCSTD) method can extract web content from different web pages efficiently and effectively.
- Categories:
- Read more about Aicyber’s System for IALP 2016 Shared Task:Character-enhanced Word Vectors and Boosted Neural Networks
- Log in to post comments
- Categories:
- Read more about Recurrent Neural Network-based Language Models with Variation in Net Topology, Language, and Granularity
- Log in to post comments
In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate wordbased and character-based language models. For characterbased language models, we look into the cases where the inter-word space is treated or not treated as a token.
40_RNN.pdf
- Categories:
It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the advantage of using recurrent neural network language models over n-gram language models will become more and more evident.
- Categories: