Documents
Presentation Slides
Importance Weighted Feature Selection Strategy for Text Classification
- Citation Author(s):
- Submitted by:
- Baoli Li
- Last updated:
- 27 November 2016 - 10:44am
- Document Type:
- Presentation Slides
- Document Year:
- 2016
- Event:
- Presenters:
- Baoli LI
- Paper Code:
- 113
- Categories:
- Log in to post comments
Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al. [1] on document frequency metric, propose a general importance weighted feature selection strategy for text classification, in which the importance value of a feature in a document is derived from its relative frequency in that document. Extensive experiments with two state-of-the-art feature selection metrics (Chi-square and information gain) on three text classification datasets demonstrate the effectiveness of the proposed strategy.