Very drafty!!! TODO: add more links from notes and structure it better
-
tiny-sklearn Minimal implementation of sklearn. Good to get general idea. https://github.com/qinhanmin2014/tiny-sklearn
-
timeseriescv split by time for sklearn https://github.com/sam31415/timeseriescv
-
trees for sklearn https://github.com/scikit-garden/scikit-garden
-
sklearn extensions https://github.com/koaning/scikit-lego
-
sklearn searchgrid syntax sugar https://github.com/jnothman/searchgrid
-
hyperopt for sklearn https://github.com/hyperopt/hyperopt-sklearn
-
sklearn pipeline extensions https://github.com/jem1031/pandas-pipelines-custom-transformers
https://github.com/Kgoetsch/sklearn_pipeline_enhancements
-
sklearn extensions* https://github.com/scikit-learn-contrib/scikit-learn-extra
-
sklern SLEP https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/
-
sklearn extensions https://github.com/rasbt/mlxtend
- Kedro by quantumblack, cookiecutter-data-science https://github.com/quantumblacklabs/kedro
https://github.com/drivendata/cookiecutter-data-science
-
DVC - data versioning https://github.com/iterative/dvc
-
Hangar - data versioning https://github.com/tensorwerk/hangar-py
- SQL on GPU https://blazingsql.com/
-
NEURAXLE, TO_CHECK Let your pipeline steps have hyperparameter spaces https://github.com/Neuraxio/Neuraxle
-
LALE Sklearn-like pipelines on steroids regarding hyper-parameters and type checking. Interoperability. IBM https://github.com/IBM/lale
-
FKLEARN Functional sklearn by NuBank. Feature names. Spatio-temporal split during validation. https://github.com/nubank/fklearn
-
DABL Simple but powerful ML. Really nice EDA. By amueller from sklearn. https://github.com/amueller/dabl
-
hyperparameter_hunter https://github.com/HunterMcGushion/hyperparameter_hunter
-
visualization for automl https://github.com/HDI-Project/ATMSeer
-
very high-level, by uber https://github.com/uber/ludwig
- get.ML Closed-source, trial, C++, incremental updates https://www.get.ml/
-
Sets handling for pandas https://github.com/Florents-Tselai/pandas-sets
-
Pandas summary https://github.com/mouradmourafiq/pandas-summary
-
Pandas column type detection https://github.com/tahaceritli/ptype-dmkd
-
Pandas EDA https://github.com/pandas-profiling/pandas-profiling
-
CleverCSV Dealing with messy CSV https://github.com/alan-turing-institute/CleverCSV
-
dirty categorical variables https://github.com/dirty-cat/dirty_cat/
-
great-expectations Testing pipelines and pandas https://github.com/great-expectations/great_expectations
-
defensive pandas https://github.com/engarde-dev/engarde
-
Intake Data directories. Compare with Kedro-intake https://github.com/intake/intake
-
data cleaning https://github.com/ericmjl/pyjanitor
-
MetaFlow NetFlix, AWS https://metaflow.org/
-
serverless pipelines https://nuclio.io/
-
feature store https://github.com/gojek/feast
-
Matrix-profile Really interesting approach https://github.com/target/matrixprofile-ts
-
sklearn inspired https://github.com/EthanRosenthal/skits
-
fast.ai timeseries https://github.com/tcapelle/TimeSeries_fastai
-
By uber - ml debug https://github.com/uber/manifold
-
By google - ml debug https://github.com/pair-code/what-if-tool
-
Compare models https://github.com/JohannesBuchner/UltraNest
-
Built AI apps fast https://www.streamlit.io/
https://github.com/plotly/plotly_express
https://github.com/gyli/PyWaffle
https://github.com/holoviz/hvplot - pandas
https://github.com/zarr-developers/zarr-python
https://github.com/scikit-hep/awkward-array
https://github.com/ibis-project/ibis
-
Add structure to your commits https://github.com/commitizen/cz-cli
-
git and jupyter https://github.com/jupyter/nbdime
-
parametrizaton of jupyter https://github.com/nteract/papermill
-
jupyter into dashboard https://github.com/voila-dashboards/voila
-
Git PL https://rogerdudler.github.io/git-guide/index.pl.html
-
Black python code formatter https://github.com/psf/black
-
tabulate ... https://github.com/gregbanks/python-tabulate
- Really good intro https://mpatacchiola.github.io/blog/
- ReAgent
RL by FB, PyTorch based https://github.com/facebookresearch/ReAgent
-
PPLM Text generation by uber https://eng.uber.com/pplm/
-
fast-bert Inspired by fast.ai https://github.com/kaushaltrivedi/fast-bert
-
CausalLML By uber. Lifting modeling. https://github.com/uber/causalml
-
Churn - MARKETING https://github.com/CamDavidsonPilon/lifetimes
-
Notes for professional book series https://books.goalkicker.com/
-
Deep-quant. Fundamental data https://github.com/euclidjda/deep-quant
-
Google data search https://datasetsearch.research.google.com/