-
Download the tar.gz file from [here] with code
q5v5
. -
run following commands to unzip the file and create a symbolic link to the extracted files.
tar zxvf AVA_compress.tar.gz -C /some/path/ cd /path/to/AlphAction/ mkdir data ln -s /some/path/AVA data/AVA
-
Download Annotations. Donwload AVA Actions annotations from the official dataset website. Organize those annotations file as following structure:
AVA/ |_ annotations/ | |_ ava_action_list_v2.2.pbtxt | |_ ava_action_list_v2.2_for_activitynet_2019.pbtxt | |_ ava_include_timestamps_v2.2.txt | |_ ava_train_excluded_timestamps_v2.2.csv | |_ ava_val_excluded_timestamps_v2.2.csv | |_ ava_train_v2.2.csv | |_ ava_val_v2.2.csv
-
Download Videos. Download the list of training/validation file names from CVDF repository and download all videos following those links provided there. Place the list file and video files as follows:
AVA/ |_ annotations/ | |_ ava_file_names_trainval_v2.1.txt |_ movies/ | |_ trainval/ | | |_ <MOVIE-ID-1>.mp4 | | |_ ... | | |_ <MOVIE-ID-N>.mp4
-
Create Symbolic Link. Create a symbolic link that references the AVA dataset directory by running following commands.
cd /path/to/AlphAction mkdir data ln -s /path/to/AVA data/AVA
-
Preprocess Videos. Running following commands to process raw movies.
python tools/process_ava_videos.py \ --movie_root data/AVA/movies/trainval \ --clip_root data/AVA/clips/trainval \ --kframe_root data/AVA/keyframes/trainval \ --process_num $[`nproc`/2]
This script extracts video clips and key frames from those raw movies. Each video clip lasts exactly one second and ranges from second 895 to second 1805. All video clips are scaled such that the shortest side becomes no larger than 360 and transcoded to have fps 25. The first frame of each video clip is extracted as key frame, which follows the definition in AVA dataset. (Key frames are only used to detect persons and objects.) The output video clips and key frames will be saved as follows:
AVA/ |_ clips/ | |_ trainval/ | | |_ <MOVIE-ID-1> | | | |_ [895~1805].mp4 | | |_ ... | | |_ <MOVIE-ID-N> | | | |_ [895~1805].mp4 |_ keyframes/ | |_ trainval/ | | |_ <MOVIE-ID-1> | | | |_ [895~1805].jpg | | |_ ... | | |_ <MOVIE-ID-N> | | | |_ [895~1805].jpg
-
Convert Annotations. Our codes use COCO-style anntations, so we have to convert official csv annotations into COCO json format by running following commands.
python tools/csv2COCO.py \ --csv_path data/AVA/annotations/ava_train_v2.2.csv \ --movie_list data/AVA/annotations/ava_file_names_trainval_v2.1.txt \ --img_root data/AVA/keyframes/trainval python tools/csv2COCO.py \ --csv_path data/AVA/annotations/ava_val_v2.2.csv \ --movie_list data/AVA/annotations/ava_file_names_trainval_v2.1.txt \ --img_root data/AVA/keyframes/trainval
The converted json files will be stored in
AVA/annotations
directory as follows,*_min.json
means that the json file has no space indent.Alternatively, you could just download our json files here(train, val).
AVA/ |_ annotations/ | |_ ava_train_v2.2.json | |_ ava_train_v2.2_min.json | |_ ava_val_v2.2.json | |_ ava_val_v2.2_min.json
-
Detect Persons and Objects. The predicted person boxes for AVA validation set can be donwloaded [here]. Note that we only use ground truth person boxes for training. The object boxes files are also available for download(train, val). These files should be placed at following locations.
AVA/ |_ boxes/ | |_ ava_val_det_person_bbox.json | |_ ava_train_det_object_bbox.json | |_ ava_val_det_object_bbox.json
For person detector, we first trained it on MSCOCO keypoint dataset and then fine-tuned it on AVA dataset. The final model weight is available [here].
For object detector, we use the model provided in maskrcnn-benchmark repository, which is trained on MSCOCO dataset. Person boxes are removed from the predicted results.