Skip to content

Latest commit

 

History

History

active_learning_benchmarks

Benchmarking methods to select examples to relabel in active learning for data labeled by multiple annotators

Code to reproduce results from the paper:

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

This repository benchmarks algorithms to compute an active learning score that quantifies how valuable it is to collect additional labels for specific examples in a classification dataset. We consider settings with multiple data annotators such that each example can be labeled more than once, if needed to ensure high-quality consensus labels.

This repository is only for intended for scientific purposes. To apply the ActiveLab algorithm to your own active learning loops with multiannotator data, you should instead use the implementation from the official cleanlab library.

Install Dependencies

To run the model training and benchmarks, you need to install the following dependencies:

pip install -r requirements.txt
pip install cleanlab

Benchmarks

Three sets of benchmarks are conducted with 3 different datasets:

Dataset Description
1 CIFAR-10H Image classification with a total of 5000 examples, where 1000 examples have annotator labels at the beginning, we collect 500 new labels each round.
2 Wall Robot Tabular classification with a total of 2000 examples, where 500 examples have annotator labels at the beginning, we collect 100 new labels each round.
3 Wall Robot Complete Tabular classification with a total of 2000 examples, where all 2000 examples have annotator labels at round 0, we collect 100 new labels each round.

The datasets used in the benchmark are downloaded from:

Additional Benchmarks

Two supplementary benchmarks were conducted on the Wall Robot dataset:

Benchmark Description
1 Single Annotator vs Multiannotator Compare labeling new data vs relabeling existing datapoints.
2 Methods for Single Label Benchmark the performance of various method in the scenario where each examples only has one label.

Results

The results/ folder for each dataset contains .npy files that are the saved results (model accuracy and consensus label accuracy) from each run of the benchmark. These files are used to vizualize the results in the plot_results.ipynb notebooks.