This little project is being developped during the 2022 happywhale competition.
As it was noticed by many participants, the individuals within images can be tiny compared to the total image's size, and a lot of elements that may not be informative if not missleading are present in the pictures.
As such, a participant released a notebook: Happywhale: Cropped Dataset [YOLOv5] ✂️ leveraging an old dataset made for a previous competition involving whale tails.
The problem is that the resulting dataset contains numerous failure cases and slight inaccuracies that could harm the performances of a model trained on the cropped dataset:
In this repo, you will find the source code for an app aiming at crowd sourcing a bounding boxes dataset for this competition.
On the app, users can either annotate images or review other annotations.
Use the mouse to place two points defining the bounding box. Note that the final bounding box will include the pixels within the visual boundaries (meaning that the pixels used to draw the boundaries are not in it)
You can accept or reject annotations propositions
On the home page you can find buttons to download directly either the raw annotations with review informations or the final dataset constituted of manually reviewed samples. You can also download the dataset in a kaggle notebook directly by copying the command lines at the bottom of the page.
- The dataset doesn't start from scratch, some annotations were obtained by merging numerous public manual annotations datasets. Here is the notebook to create the annotations: 🐳&🐬 - 👨🔬 Merging public bounding box datasets
- The annotations doesn't start from scratch, some automatic annotations are already entered and just need a manual review. The annotations were obtained by taking the dataset from Happywhale: Cropped Dataset [YOLOv5] ✂️ and filtered using the methods described in 🐳&🐬 - Filter YOLOv5 failure cases
- Only the training images are in the app to comply with the competition's rules