PRIMO Similarity Search

Setup

This repository comes with a Dockerfile which allows you to reproduce our development environment. To use it, you must have a GPU-equipped server or workstation, and the ability to download and install docker and nvidia-docker.

Note that this environment does not include the image dataset (OpenImages V4) used for training and experiments. This dataset is publicly available, but requires over 1 terabyte of storage space, and a significant amount of time to download. Scripts to manage the download are included in this repository (see Downloading Datasets below).

For convenience, we have pre-processed the images with VGG16-FC2 to extract feature vectors. These feature vectors take up about 60 gigabytes and are available for download from the primo-openimages repository.

Once you have installed docker and nvidia-docker, run the following command in this directory to build the docker image:

docker build -t primo .

Then run the following command to start the container, which will launch a jupyter notebook server on port 8888 (use -p PORT to specify a different one):

sudo bash docker.sh -d /path/to/primo-openimages

Replace /path/to/primo-openimages with the path to the primo-openimages repository.

Downloading Datasets

The primo-openimages repository contains the VGG16-FC2 feature vectors for the images used in our experiments. These are sufficient to train the encoder and perform wetlab experiments, but if you wish to view the images themselves you will need to download the original files.

If you just want to download or view a single image, you can use its unique identifier to look up its URL, using this index.

If you want to download all of the images and organize them into the same sets that we used for our experiments, open and run this notebook.

The code used to extract the feature vectors can be found in this notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
notebooks		notebooks
primo		primo
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker.sh		docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRIMO Similarity Search

Setup

Downloading Datasets

About

Releases

Packages

Languages

crad23/primo-similarity-search

Folders and files

Latest commit

History

Repository files navigation

PRIMO Similarity Search

Setup

Downloading Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages