Official code implementation for the paper "DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection" (AAAI 2021) paper.
The code is developed based on the architecture of zylo117/Yet-Another-EfficientDet-Pytorch. We also follow some data pre-processing and model evaluation methods in BigRedT/no_frills_hoi_det and vt-vl-lab/iCAN. We sincerely thank the authors for the excellent work.
- Training and Test for V-COCO dataset
- Training and Test for HICO-DET dataset
- Demonstration on images
- Demonstration on videos
- More efficient voting strategy for inference using GPU
The code was tested with python 3.6, pytorch 1.5.1, torchvision 0.6.1, CUDA 10.2, and Ubuntu 18.04.
-
Clone this repository:
git clone https://github.com/MVIG-SJTU/DIRV.git
-
Install pytorch and torchvision:
pip install torch==1.5.1 torchvision==0.6.1
-
Install other necessary packages:
pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors
Download V-COCO dataset following the official instructions.
You can find the files new_prior_mask.pkl here. Each element inside it refers to the prior probability that a verb (e.g. eat) is associated with an object category (e.g. apple). You should also download the combined training and valdataion sets annotations instances_trainval2014.json here, and put it in datasets/vcoco/coco/annotations.
Download HICO-DET dataset from the official website.
We transform the annotations of HICO-DET dataset to JSON format following BigRedT/no_frills_hoi_det. You can directly download the processed annotations from here.
We count the training sample number of each category in hico_processed/hico-det_verb_count.json. It serves as a weight when calculating loss.
Make sure to put the files in the following structure:
|-- datasets
| |-- vcoco
| | |-- data
| | | |-- splits
| | | |-- vcoco
| | |
| | |-- coco
| | | |-- images
| | | |-- annotations
| | |-- new_prior_mask.pkl
| |-- hico_20160224_det
| | |-- images
| | |-- hico_processed
CUDA_VISIBLE_DEVICES=0 python demo.py --image_path /path/to/a/single/image
Coming soon.
You can download the pre-trained weights for V-COCO dataset (vcoco_best.pth) and HICO-DET dataset (hico-det_best.pth) here.
Download the pre-trained weight of our backbone (efficientdet-d3_vcoco.pth and efficientdet-d3_hico-det.pth) here, and save it in weights/
directory.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py -p vcoco --batch_size 32 --load_weights weights/efficientdet-d3_vcoco.pth
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -p hico-det --batch_size 48 --load_weights weights/efficientdet-d3_hico-det.pth
You may also adjust the saving directory and GPU number in projects/vcoco.yaml
and projects/hico-det.yaml
or create your own projects in projects/
.
CUDA_VISIBLE_DEVICES=0 python test_vcoco.py -w $path to the checkpoint$
CUDA_VISIBLE_DEVICES=0 python test_hico-det.py -w $path to the checkpoint$
Then please follow the same procedures in vt-vl-lab/iCAN to evaluate the result on HICO-DET dataset.
If you found our paper or code useful for your research, please cite the following paper:
@inproceedings{fang2020dirv,
title={DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection},
author={Fang, Hao-Shu and Xie, Yichen and Shao, Dian and Lu, Cewu},
year={2021},
booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}
}