Faster Seeding than K-means++ several orders of magnitude using binary tree. The drop in quantization error is insignificant.

Tree K-means implementation of seeding before Lloyd's algorithm.

Tree K-means implementation runs in O(log(n)kd) time in comparison to K-means++ which runs in O(nkd) while at the same time not requiring significant memory overhead. This is a huge computational advancement as seeding for large datasets is extremely computationally expensive.

Written in Cython using Numpy dependency and compiled using distutils. You can change the dataset in k_tree_means.pyx file and modify compiler directives to Cython in setup.py.

Then one can compile it in command line python setup.py build_ext --inplace within the directory and your package is built.

I also built the package and commited it, you can see it in build folder, just for the sake of an example.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
build/temp.macosx-10.5-x86_64-2.7		build/temp.macosx-10.5-x86_64-2.7
sample_datasets		sample_datasets
LICENSE		LICENSE
README.md		README.md
full_tree_k_means.pyx		full_tree_k_means.pyx
setup.py		setup.py
tree_k_means.c		tree_k_means.c
tree_k_means.pyx		tree_k_means.pyx
tree_k_means.so		tree_k_means.so

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Faster Seeding than K-means++ several orders of magnitude using binary tree. The drop in quantization error is insignificant.

About

Releases

Packages

Languages

License

anuar12/tree-k-means

Folders and files

Latest commit

History

Repository files navigation

Faster Seeding than K-means++ several orders of magnitude using binary tree. The drop in quantization error is insignificant.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages