Skip to content

QuickDraw Dataset Cross-validated Predicted Probabilities

Latest
Compare
Choose a tag to compare
@cgnorthcutt cgnorthcutt released this 05 May 22:33
· 65 commits to main since this release
c84dfe8

We release the cross-validated predicted probabilities for the QuickDraw dataset. These probabilities were trained using 4-fold cross-validation for all 50,426,266 examples and 345 classes. The resulting predicted probabilities (pyx numpy matrix) is shape 50426266 x 345. The resulting file is 33GB in np.float16 format.

Note, pyx is short for prob(y = label | data example x).

Download the QuickDraw Cross-validated Predicted Probabilities as an numpy matrix.

Make sure pigz and wget are installed:

# on Mac OS
brew install wget pigz
# on Ubuntu
sudo apt-get install pigz

Download the pyx files

base_url="https://github.com/cgnorthcutt/label-errors/releases/download/"
base_filename="quickdraw-pyx-v1/quickdraw_pyx.tar.gz-parta"
for part in $(eval echo "{a..k}"); do
    wget --continue $base_url$base_filename$part
done

Decompress the tar.gz file parts into the final pyx numpy matrix:

cat quickdraw_pyx.tar.gz-part?? | unpigz | tar -xvC .

Ancillary extra details

To compress the pyx probabilities file prior to uploading, we used the followign command

tar -I pigz -cvf - quickdraw_pyx.npy | split --bytes=1800M - "quickdraw_pyx.tar.gz-part"