Documents
Poster
PSEUDO-SUPERVISED APPROACH FOR TEXT CLUSTERING BASED ON CONSENSUS ANALYSIS
- Citation Author(s):
- Submitted by:
- Peixin Chen
- Last updated:
- 13 April 2018 - 4:05am
- Document Type:
- Poster
- Document Year:
- 2018
- Event:
- Presenters:
- Peixin Chen
- Paper Code:
- 1332
- Categories:
- Keywords:
- Log in to post comments
In recent years, neural networks (NN) have achieved remarkable performance improvement in text classification due to
their powerful ability to encode discriminative features by incorporating label information into model training. Inspired
by the success of NN in text classification, we propose a pseudo-supervised neural network approach for text clustering.
The neural network is trained in a supervised fashion with pseudo-labels, which are provided by the cluster labels
of pre-clustering on unsupervised document representations. To enhance the quality of pseudo-labels, a consensus analysis
is employed to select training samples for the neural network. The experimental results demonstrate that the proposed approach
can improve the clustering performance significantly.