Non-Parametric Calibration for Classification

This repository provides an implementation of the paper "Non-Parametric Calibration for Classification" (Jonathan Wenger, Hedvig Kjellström, Rudolph Triebel) published at AISTATS 2020. All results presented in our work were produced with this code.

Introduction

Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. We provide a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. In the experiments included in this repository, we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.

Installation and Documentation

The code was developed in Python 3.6 under Ubuntu (18.04). You can install this Python 3 package using pip (or pip3):

pip install setuptools numpy scipy scikit-learn cython
pip install git+https://github.com/JonathanWenger/pycalib.git

Note that some dependencies need to be installed separately since a subset of experiments rely on scikit-garden. You can also clone this repository to run the scripts reproducing the experiments in the paper via:

pip install setuptools numpy scipy scikit-learn cython
pip install -e git+git://github.com/scikit-garden/scikit-garden.git#egg=scikit-garden
git clone https://github.com/JonathanWenger/pycalib
cd pycalib
python setup.py install

For tips on getting started and how to use this package please refer to the documentation.

Experiments and Benchmark Datasets

We performed calibration experiments for binary and multi-class benchmark datasets from computer vision for a range of classifiers and calibration methods. We found that GP calibration performed particularly well on large-scale architectures and challenging data sets.

The experiments can be reproduced by using the scripts in benchmark and figures. The datasets we used and how to obtain them are listed below.

PCam: Due to the size of the data, only a script replicating the experiments is provided. The data can be downloaded from the PCam repository.
KITTI: The repository includes 64-dimensional features extracted from KITTI sequences compressed in a zip file datasets/kitti/kitti_data.zip.
MNIST: A script will automatically download the MNIST dataset if needed.
CIFAR-100: When the CIFAR-100 experiment is run, there is an option to automatically download the dataset.
ImageNet 2012: Due to the size of the data, only a script replicating the experiments is provided. The ImageNet validation data can be obtained from the ImageNet website.

Publication

If you use this repository in your research, please cite the following paper:

J. Wenger, H. Kjellström, and R. Triebel. Non-parametric calibration for classification (PDF). In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).

@InProceedings{wenger2020calibration,
  title         = {Non-Parametric Calibration for Classification},
  author        = {Jonathan Wenger and Hedvig Kjellstr{\"o}m and Rudolph Triebel},
  booktitle     = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)},
  year          = {2020},
  series        = {Proceedings of Machine Learning Research},
  keywords      = {calibration, non-parametric, gaussian processes, classification},
  url           = {https://github.com/JonathanWenger/pycalib}
}

License and Contact

This work is released under the MIT License.

Please submit an issue on GitHub to report bugs or request changes.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
benchmark		benchmark
datasets		datasets
docs		docs
figures		figures
pycalib		pycalib
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-Parametric Calibration for Classification

Introduction

Installation and Documentation

Experiments and Benchmark Datasets

Publication

License and Contact

About

Contributors 3

Languages

License

JonathanWenger/pycalib

Folders and files

Latest commit

History

Repository files navigation

Non-Parametric Calibration for Classification

Introduction

Installation and Documentation

Experiments and Benchmark Datasets

Publication

License and Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages