Documents
Presentation Slides
Named Entity Recognition on Indonesian Microblog Messages
- Citation Author(s):
- Submitted by:
- Natanael Taufik
- Last updated:
- 22 November 2016 - 7:42am
- Document Type:
- Presentation Slides
- Document Year:
- 2016
- Event:
- Presenters:
- Natanael Taufik
- Paper Code:
- 20
- Categories:
- Keywords:
- Log in to post comments
This paper describes a model to address the task of named-entity recognition on Indonesian microblog messages due to its usefulness for higher-level tasks or text mining applications on Indonesian microblogs. We view our task as a sequence labeling problem using machine learning approach. We also propose various word-level and orthographic features, including the ones that are specific to the Indonesian language. Finally, in our experiment, we compared our model with a baseline model previously proposed for Indonesian formal documents, instead of microblog messages. Our contribution is two-fold: (1) we developed NER tool for Indonesian microblog messages, which was never addressed before, (2) we developed NER corpus containing around 600 Indonesian microblog messages available for future development.