Move image retrieval code here #1

thequicksort · 2021-06-07T18:30:11Z

This repository hosts the feature vector representations of the image data set used for similarity search. The resulting HDF5 files are orders of magnitude more compact than storing the raw images. As such, we should move the scripts/notebooks for downloading raw images to this repository.

Here are the repositories that use this:

High Level

Low Level

Open Images

migrate download/feature extraction code from cas9 similarity search

Hybridization Similarity Search

delete notebooks/01_datasets/01_download.ipynb
delete notebooks/01_datasets/02_extract_features.ipynb
create interface for accessing feature vectors from open images

Cas9 Similarity Search

migrate notebooks/01_datasets/01_download.ipynb to Open Images
migrate notebooks/01_datasets/02_extract_features.ipynb to Open Images
copy docker.sh and Dockerfile to Open images
create interface for accessing feature vectors from open images

The text was updated successfully, but these errors were encountered:

thequicksort · 2021-06-09T19:08:20Z

Make the Image feature vector download a separate process from checking out the repository / starting the docker image.
Allow the user to point to the location of the feature vectors (e.g. a different location on disk, a location in the Docker container).

Q: Why?
A: Because users might want to utilize different parts of the pipeline, like sequencing analysis, that shouldn't require downloading the gigabytes of feature vector data.

thequicksort · 2021-06-14T19:25:02Z

Open question: How do we want the similarity search repositories to access the feature vectors? What should we recommend to users checking out the repository who want to reproduce our results (perhaps even without downloading all the images from scratch)?

1 - Git submodules
2 - Manually specify locations (requires extra steps of user checking out repository, running git lfs, etc)
3 - As part of this pipeline, publish to external bucket
4 - Other approaches?

thequicksort · 2021-06-14T19:27:12Z

High-level overview of the migration proposal:

thequicksort assigned callistabee Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move image retrieval code here #1

Move image retrieval code here #1

thequicksort commented Jun 7, 2021 •

edited

Loading

thequicksort commented Jun 9, 2021

thequicksort commented Jun 14, 2021

thequicksort commented Jun 14, 2021

Move image retrieval code here #1

Move image retrieval code here #1

Comments

thequicksort commented Jun 7, 2021 • edited Loading

High Level

Low Level

Open Images

Hybridization Similarity Search

Cas9 Similarity Search

thequicksort commented Jun 9, 2021

thequicksort commented Jun 14, 2021

thequicksort commented Jun 14, 2021

thequicksort commented Jun 7, 2021 •

edited

Loading