This repository contains the implementation of Learning to predict crop type from heterogeneous sparse labels using meta-learning, published at the EarthVision workshop at CVPR 2021.
The main entrypoints into the pipeline are scripts. Specifically:
- scripts/export.py exports data (locally, or to Google Drive, depending on what is being exported)
- scripts/process.py processes the raw data
- scripts/engineer.py combines the earth observation data with the labels to create (x, y) training data
- scripts/maml.py trains the MAML model
- scripts/test.py tests the trained MAML model by finetuning it on the test datasets
- scripts/ensemble.py takes weights saved by test.py and ensembles them to create maps
- scripts/pretrain.py trains a model on all data, for a transfer learning baseline
Two crop type maps created using few positive labelled points are available on Google Earth Engine:
- Coffee map for 2019-2020 season in Luís Eduardo Magalhães municipality, Brazil
- Common bean map for 2019-2020 season in Busia, Kenya
Note: not all datasets used are public, so results cannot be exactly replicated.
- Download the LEM+ dataset, and save it in
data/raw/lem_brazil
- Export the GeoWiki labels, by running
export_geowiki
inscripts/export.py
- Process all the labels, by running
scripts/process.py
- Export the Sentinel Earth Engine tif files by running the other functions in
scripts/export.py
- Combine the labels and raw satellite imagery into
(X, y)
training data by runningscripts/engineer.py
- Train the MAML model by running
maml.py
. The MAML model and training results will be saved indata/maml_models/version_<VERSION>
, where VERSION increments for each MAML run. - Finetune 10 MAML model with the following commands, bootstrapping the training data each run: (adding
--test_mode {pretrained, random}
will train the baseline models)
python maml_test.py --version <VERSION> --dataset Togo --many_n --num_cv 10 # Finetune on the Togo data across varying sample sizes
python maml_test.py --version <VERSION> --dataset coffee --num_samples {-1, 40} --num_cv 10 # Finetune on the coffee dataset for all negative samples, or 20 positive and 20 negative samples
python maml_test.py --version <VERSION> --dataset common_beans --num_samples {-1, 64}, --num_cv 10 # Finetune on the common beans dataset for all negative samples, or 32 positive and 32 negative samples
Anaconda running python 3.6 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from this directory) run
conda env create -f environment.yml
This will create an environment named landcover-mapping
with all the necessary packages to run the code. To
activate this environment, run
conda activate landcover-mapping
Earth engine is used instead of sentinel hub, because it is free. To use it, once the conda environment has been activated, run
earthengine authenticate
and follow the instructions. To test that everything has worked, run
python -c "import ee; ee.Initialize()"
Note that Earth Engine exports files to Google Drive by default (to the same google account used sign up to Earth Engine).
Running exports can be viewed (and individually cancelled) in the Tabs
bar on the Earth Engine Code Editor.
For additional support the Google Earth Engine forum is super
helpful.
The following tests can be run against the pipeline:
pytest # unit tests, written in the test folder
black . # code formatting
If you find this code useful, please cite the following paper:
@InProceedings{Tseng_2021_CVPR,
author = {Tseng, Gabriel and Kerner, Hannah and Nakalembe, Catherine and Becker-Reshef, Inbal},
title = {Learning To Predict Crop Type From Heterogeneous Sparse Labels Using Meta-Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2021},
pages = {1111-1120}
}