DistML is a machine learning tool which allows traing very large models on Spark, it's fully compatible with Spark (tested on 1.2 or above).
Reference paper: Large Scale Distributed Deep Networks
Runtime view:
DistML provides several algorithms (LR, LDA, Word2Vec, ALS) to demonstrate its scalabilites, however, you may need to write your own algorithms based on DistML APIs(Model, Session, Matrix, DataStore...), generally, it's simple to extend existed algorithms to DistML, here we take LR as an example: How to implement logistic regression on DistML.
- Download and build DistML.
- Typical options.
- Run Sample - LR.
- Run Sample - MLR.
- Run Sample - LDA.
- Run Sample - Word2Vec.
- Run Sample - ALS.
- Benchmarks.
- FAQ.
He Yunlong (Intel)
Sun Yongjie (Intel)
Liu Lantao (Intern, Graduated)
Hao Ruixiang (Intern, Graduated)