Sorry, you need to enable JavaScript to visit this website.

Importance Weighted Feature Selection Strategy for Text Classification

Citation Author(s):
Baoli Li
Submitted by:
Baoli Li
Last updated:
27 November 2016 - 10:44am
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Baoli LI
Paper Code:
113
 

Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al. [1] on document frequency metric, propose a general importance weighted feature selection strategy for text classification, in which the importance value of a feature in a document is derived from its relative frequency in that document. Extensive experiments with two state-of-the-art feature selection metrics (Chi-square and information gain) on three text classification datasets demonstrate the effectiveness of the proposed strategy.

up
0 users have voted: