Skip to content

DistML provide a supplement to mllib to support model-parallel on Spark

License

Notifications You must be signed in to change notification settings

VinceShieh/DistML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DistML (Distributed Machine Learning platform)

DistML is a machine learning tool which allows traing very large models on Spark, it's fully compatible with Spark (tested on 1.2 or above).

Reference paper: Large Scale Distributed Deep Networks

Runtime view:

DistML provides several algorithms (LR, LDA, Word2Vec, ALS) to demonstrate its scalabilites, however, you may need to write your own algorithms based on DistML APIs(Model, Session, Matrix, DataStore...), generally, it's simple to extend existed algorithms to DistML, here we take LR as an example: How to implement logistic regression on DistML.

User Guide

  1. Download and build DistML.
  2. Typical options.
  3. Run Sample - LR.
  4. Run Sample - MLR.
  5. Run Sample - LDA.
  6. Run Sample - Word2Vec.
  7. Run Sample - ALS.
  8. Benchmarks.
  9. FAQ.

API Document

  1. Source Tree.
  2. DistML API.

Contributors

He Yunlong (Intel)
Sun Yongjie (Intel)
Liu Lantao (Intern, Graduated)
Hao Ruixiang (Intern, Graduated)

About

DistML provide a supplement to mllib to support model-parallel on Spark

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 55.0%
  • Scala 44.9%
  • Shell 0.1%