Skip to content

Python Program for Text Clustering using Bisecting k-means

Notifications You must be signed in to change notification settings

sowmyagowri/Text-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-Clustering

Normalized Mutual Information (NMI) Score: 0.6934

Approach:

  1. The input data containing 8580 text records in sparse format is first read into a matrix.
  2. This CSR matrix is then scaled by IDF and normalized by its L2-norm and then converted to a dense ndarray representation.
  3. This array is then separated into the desired number of clusters using bisecting k-means clustering approach.

Calinski Harabaz Score (Caliński, T., & Harabasz, J. (1974). “A dendrite method for cluster analysis”. Communications in Statistics-theory and Methods 3: 1-27.) has been calculated for the list of clusters for values of k starting from 3 to 21 in steps of 2 for the given dataset.

This metric has been plotted on the y-axis against the values for k on the x-axis

plot

About

Python Program for Text Clustering using Bisecting k-means

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published