Hierarchical classification module based on scikit-learn's interfaces and conventions.
See the GitHub Pages hosted documentation here.
To install, simply install this package via pip into your desired virtualenv, e.g:
pip install sklearn-hierarchical-classification
See examples/ for usage examples.
To run the included unit-tests, install the test dependencies and then invoke using nose
:
pip install -e '.[test]'
pip install nose
nosetests
Support for interactive development is built in to the HierarchicalClassifier
class. This will enable progress bars (using the excellent tqdm library) in various places during training and may otherwise enable more visibility into the classifier which is useful during interactive use. To enable this make sure widget extensions are enabled by running:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
You can then instantiate a classifier with the progress_wrapper
parameter set to tqdm_notebook
:
clf = HierarchicalClassifier(
base_estimator=svm.LinearSVC(),
class_hierarchy=class_hierarchy,
progress_wrapper=tqdm_notebook,
)
Auto-generated documentation is provided via sphinx. To build / view:
$ cd docs/
$ make html
$ open build/html/index.html
Documentation is published to GitHub pages from the gh-pages
branch.
If you are a contributor and need to update documentation, a good starting point for getting setup is this tutorial.
this module is heavily influenced by the following previous work and papers:
- "Functional Annotation of Genes Using Hierarchical Text Categorization" - Kiritchenko et al. 2005
- "Classifying web documents in a hierarchy of categories: a comprehensive study" - Ceci and Malerba 2007
- "A survey of hierarchical classification across different application domains" - CN Silla et al. 2011
- "A Survey of Automated Hierarchical Classification of Patents" - JC Gomez et al. 2014
- "Evaluation Measures for Hierarchical Classification: a unified view and novel approaches" - Kosmopoulos et al. 2013
- "Bayesian Aggregation for Hierarchical Classification" - Barutcuoglu et al. 2008
- "Kaggle LSHTC4 Winning Solution" - Puurula et al. 2014
- "Feature-Weighted Linear Stacking" - Sill et al. 2009