Generalized K-Means Clustering

This project generalizes the Spark MLLIB Batch K-Means (v1.1.0) clusterer and the Spark MLLIB Streaming K-Means (v1.2.0) clusterer. Most practical variants of K-means clustering are implemented or can be implemented with this package, including:

If you find a novel variant of k-means clustering that is provably superior in some manner, implement it using the package and send a pull request along with the paper analyzing the variant!

This code has been tested on data sets of tens of millions of points in a 700+ dimensional space using a variety of distance functions. Thanks to the excellent core Spark implementation, it rocks!

Name		Name	Last commit message	Last commit date
Latest commit History 824 Commits
.bloop		.bloop
.vscode		.vscode
project		project
release-notes		release-notes
src		src
target/scala-2.12/api		target/scala-2.12/api
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
SAMPLE_LOG.md		SAMPLE_LOG.md
build.sbt		build.sbt
release.sbt.old		release.sbt.old
scalastyle-config.xml		scalastyle-config.xml
sonatype.sbt		sonatype.sbt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized K-Means Clustering

About

Releases

Packages

Contributors 4

Languages

License

derrickburns/generalized-kmeans-clustering

Folders and files

Latest commit

History

Repository files navigation

Generalized K-Means Clustering

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages