This is the official implementation of our paper "Satellite Video Multi-Label Scene Classification With Spatial and Temporal Feature Cooperative Encoding: A Benchmark Dataset and Method".
This is the first publicly available and large-scale satellite video multi-label scene classification dataset.
It consists of 18 classes of static and dynamic ground contents, 3549 videos, and 141960 frames. We also propose a baseline method STFCE.
Our Dataset:
We hope that this work provides a new research topic for researchers to promote the applications of satellite video.
Train and val dataset:DATASET.zip
Train and val frame features extracted by Inception network: Train and Val frame features.zip
STFCE models: best model
All our experiments were done on 4 Tesla V100 GPUs. We imported our conda and pip environment configurations into two files:conda_env.yml and requirements.txt.
- Use the following code to reproduce the environment and make sure your GPUs are available:
conda env create -f conda_env.yml
pip install -r requirements.txt
- Download the dataset, train-val frame features, and our pretrained models using above link. Then put them in the code root directory. We use the extracted frame features to train and test our model. Test our model:
python eval.py --eval_data_pattern="val.tfrecord" --model=LstmModel --train_dir=stfce_model --frame_features=True --feature_names="rgb" --feature_sizes="1024" --batch_size=1024 --base_learning_rate=0.0002 --lstm_random_sequence=True --run_once=True --top_k=18 --num_classes=18
- Train our model:
python train.py --train_data_pattern=train.tfrecord --model=LstmModel --train_dir=stfce_model --frame_features=True --feature_names="rgb" --feature_sizes="1024" --batch_size=80 --base_learning_rate=0.0002 --lstm_random_sequence=True --max_step=1000 --num_classes=18 --export_model_steps=100 --num_epochs=36
- We also support other models, please refer to Gated NetVLAD, Gated NetFV...
If you find this project useful in your research, please consider cite:
@ARTICLE{10471306,
author={Guo, Weilong and Li, Shengyang and Chen, Feixiang and Sun, Yuhan and Gu, Yanfeng},
journal={IEEE Transactions on Image Processing},
title={Satellite Video Multi-Label Scene Classification With Spatial and Temporal Feature Cooperative Encoding: A Benchmark Dataset and Method},
year={2024},
volume={33},
pages={2238-2251},
doi={10.1109/TIP.2024.3374100}}