Detecting Concepts and Generating Captions from Medical Images: Contributions of the VCMI Team to ImageCLEFmedical 2022 Caption
This is the official repository for the VCMI's Team submission to ImageClefmedical Caption 2022.
You can find the paper here.
For more information please contact [email protected].
You can find the package requirements in the requirements.txt file.
For pycocoevalcap make sure that your locale has en_US.UTF-8, otherwise computing METEOR will throw an error. To change your locale just do sudo update-locale LC_ALL=en_US.UTF-8
and reboot your computer.
ImageClefMedical/dataset
train/
valid/
test/
caption_prediction_train.csv
caption_prediction_valid.csv
concept_detection_train.csv
concept_detection_valid.csv
concepts.csv
Run python preprocessing/get_topconcepts.py
. This script will create the following files:
- concepts_top100.csv
- concept_detection_train_top100.csv e concept_detection_valid_top100.csv
- caption_prediction_train_top100.csv e caption_prediction_valid_top100.csv
The concepts_top100.csv corresponds to the concepts.csv file, but filtered to contain only the top-K most-frequent concepts. The other files correspond to their original counterparts but with the concepts not present in the top-K removed (images that end up without any valid concept are also removed).
Run python preprocessing/convert_to_coco.py <your_file_here>
. This script will convert the given file into COCO format and save it as <your_file_here>_coco.json.
You should call this script for both caption_prediction_train.csv and caption_prediction_valid.csv files (and/or their top-K versions).
Run python preprocessing/gen_test_images_csv.py --datadir dataset/test
to generate a csv file with all the test images. This file is needed to generate predictions for all the test images.
To train the multilabel model run python concept_detection/multilabel/train.py
. If you want to specify the number of top-K concepts to use, just add --nr_concepts <number_of_concepts_to_consider>
.
To make predictions, use the predict.py
script. This script has two important arguments: the images_csv and nr_concepts. The first specifies for which images you want to generate predictions. For example, you might want to generate predictions only for the images that contain at least one of the top-100 concepts. The nr_concepts argument must be set in accordance to what the model was trained with. So, if for example you trained your model to consider only the top-100 concepts, then nr_concepts should be 100. Then, for inference, two situations arise:
- you want to generate predictions for the subset of images of the top-100 concepts:
python predict.py --nr_concepts 100 --images_csv dataset/concept_detection_valid_top100.csv
- although your model was trained with 100 concepts you want to generate predictions for all validation images:
python predict.py --nr_concepts 100 --images_csv dataset/concept_detection_valid.csv
You can also use this script to generate the submission files on the test set: python predict.py --images_csv dataset/test_images.csv
. Don't forget to generate the test_images.csv before running inference (see the Preprocessing section).
in progress...
For the semantic model, a semantic_types.csv
file is needed. This file can be generated by running python concept_detection/semantic/umls/get_semantic_types.py
. Note that this script makes use of the UMLS Rest API, so you need to first obtain an API key here and then copy this key to the apikey
variable in get_semantic_types.py
.
Next, the concepts need to be mapped to their corresponding semantic types. This can be done using the convert2semantic.py
script. Change the --concepts_csv
arg to concepts_top100.csv
if you are working with the top-100 subset.
To train the whole semantic pipeline (9 models), you can follow the example in semantic_baseline_train.sh
and run it with ./semantic_baseline_train.sh
. You can also train each model individually by calling the train.py
script directly.
The predict.py
script will load all 9 models and generate a csv file for each. To use the script just run python concept_detection/semantic/predict.py --models_dir <path_to_previously_trained_models>
. This script works similarly to the predict.py
script of the multilabel baseline. Check the instructions in the multilabel section to better understand its otpions.
Finally, use the aggregate.py
script to aggregate all predictions into a single csv file that can be given to the evaluator.py
file. Point the --preds_dir
arg to the directory where the 9 csv predictions files are stored.
To compute the F1-score, use the evaluator.py
file, specifying the ground_truth_path
and submission_file_path
. You should take into account that the number of images of both files should match, so if you generated predictions for the top-100 subset, your ground_truth_path
should be dataset/concept_detection_valid_top100.csv
, while if you generated predictions for the whole validation set, it should be dataset/concept_detection_valid.csv
.
To train the baseline Vision Encoder-Decoder Transformer model just run python captioning/baseline_without_concepts/train.py
.
To evaluate your trained model on the validation set run python captioning/baseline_without_concepts/generation <checkpoint_to_trained_model>
.
To compute the evaluation scores run python captioning/eval-coco.py val_preds.json
. (The val_preds.json file is generated in the previous step.)
To install the OSCAR package run pip install -e captioning/Oscar
.
To train the modified OSCAR model run python captioning/Oscar/oscar/run_captioning.py --do_train --model_name_or_path <path_to_pretrained_checkpoint>
. This pretrained checkpoint can be obtained directly from the OSCAR repo. In the paper we used the coco_captioning_base_xe.zip checkpoint from here.
To evaluate your trained model run python captioning/Oscar/oscar/run_captioning.py --do_eval --eval_model_dir <path_to_trained_checkpoint>
.