Skip to content

Latest commit

 

History

History
391 lines (305 loc) · 18 KB

README.md

File metadata and controls

391 lines (305 loc) · 18 KB
Giga logo

UNICEF Giga: AI-enabled School Mapping

GitHub repo size GitHub stars Twitter Follow

Table of Contents

  1. About Giga
  2. About
  3. Getting Started
  4. Contribution Guidelines
  5. Code Design
  6. Code of Conduct
  7. License
  8. Contact
  9. Acknowledgements

About Giga

Giga is a UNICEF-ITU global initiative to connect every school to the Internet and every young person to information, opportunity, and choice. By connecting all schools to the Internet, we ensure that every child has a fair shot at success in an increasingly digital world.

About

This work leverages deep learning and high-resolution satellite images for automated school mapping and is developed under Giga, a global initiative by UNICEF-ITU to connect every school to the internet by 2030.

Obtaining complete and accurate information on schools locations is a critical first step to accelerating digital connectivity and driving progress towards SDG4: Quality Education. However, precise GPS coordinate of schools are often inaccurate, incomplete, or even completely non-existent in many developing countries. In support of the Giga initiative, we leverage computer and remote sensing data to accelerate school mapping. This work aims to support government agencies and connectivity providers in improving school location data to better estimate the costs of digitally connecting schools and plan the strategic allocation of their financial resources.

Project Objective

  • Present a publicly available, end-to-end pipeline for automated school location detection from high-resolution satellite images.
  • Help governments improve the quality of school location information in their national register.
  • Identify new, previously unmapped schools in way that is quick, efficient, and scalable.

System Flow Diagram

For each school and non-school location in our dataset, we downloaded 300 x 300 m, 500 x 500 px high-resolution satellite images from Maxar with a spatial resolution of 60 cm/px.

Github Repositories

Built With

  • ML/DL Frameworks: Scikit-learn, Pytorch
  • Programming Language: Python
  • Geospatial Libraries: GeoPandas, Rasterio, Fiona, GDAL

Getting Started

Setup

conda create -n <env_name> python==3.10.13
conda activate <env_name>
pip install -r requirements.txt

Add your present working directory (pwd) to your Python path environment variable by adding this line to ~/.profile:

export PYTHONPATH=$(pwd)

Add the conda environment to jupyterlab:

conda install ipykernel
ipython kernel install --user --name=<env_name>

Updating Building URLs on Leafmap

Navigate to your site packages, e.g. /anaconda/envs/envname/lib/python3.10/site-packages and edit the building URLs in leafmap/common.py as follows.

Microsoft Building Footprints

Find the function download_ms_buildings() and replace the building URL with following:

https://minedbuildings.z5.web.core.windows.net/global-buildings/dataset-links.csv

Google Open Buildings

Find the function download_google_buildings() and replace the building URL with following:

https://openbuildings-public-dot-gweb-research.uw.r.appspot.com/public/tiles.geojson

Install GDAL/OGR: Follow these instructions.

Code Design

This repository is divided into the following files and folders:

  • configs/: contains the configuration files (data configs, satellite image configs, model configs, etc.)
  • notebooks/: contains all Jupyter notebooks for exploratory data analysis.
  • utils/: contains utility methods for loading datasets, building model, and performing training routines.
  • src/: contains scripts runnable scripts for automated data cleaning and model training, evaluation, and deployment.

Data Download

To download the relevant datasets, run either of the following:

  • notebooks/01_data_download.ipynb
  • python src/data_download.py:
usage: data_download.py [-h] [--config CONFIG] [--profile PROFILE]

Data Download
options:
  -h, --help         show this help message and exit
  --config CONFIG    Path to the configuration file
  --profile PROFILE  Path to the profile file

Sample usage

python src/data_download.py --config="configs/data_configs/data_config_ISO_AS.yaml" --profile="configs/profile.share"

Outputs

  • School files are saved to data/vectors/<project_name>/school/
  • Non-school files are save to data/vectors/<project_name>/non_school/.

Data Preparation

The data cleaning script can be found in src/data_preprocess.py:

usage: data_preprocess.py [-h] [--config CONFIG] [--creds CREDS] [--clean_pos CLEAN_POS] [--clean_neg CLEAN_NEG]


Data Cleaning Pipeline

options:
  -h, --help                      show this help message and exit
  --config CONFIG                 Path to the configuration file
  --sat_config SAT_CONFIG         Path to the satellite config file
  --sat_creds SAT_CREDS           Path to the satellite credentials file
  --clean_neg CLEAN_NEG           Clean negative samples (bool, default: False)
  --sources SOURCES [SOURCES ...] Sources (string, default: unicef, osm, overture)
  --imb_ratio IMB_RATIO           Imbalance ratio for negative samples (int, default: 2)

Cleaning School Samples

Run data cleaning for the positive samples.

Sample usage

python src/data_preprocess.py --config="configs/data_configs/data_config_ISO_AF.yaml" --sat_creds="configs/sat_configs/sat_creds.yaml" --sat_config="configs/sat_configs/sat_config_500x500_60cm.yaml" --clean_neg=False

Manual Data Cleaning ✨

Manually inspect and clean the satellite images using notebooks/03_sat_cleaning.ipynb.

This will add a validated column (or field) to the <iso>_clean.geojson file indicating which images to retain (0) and discard (-1) for model training.

Outputs

  • Vector outputs are saved to data/vectors/<project_name>/school/clean/<iso_code>_clean.geojson.
  • Satellite images are saved to data/rasters/500x500_60cm/<project_name>/<iso_code>/school/

Cleaning Non-school Samples

Run data cleaning for the negative samples. This will sample up to 2x the number of (clean) school data points.

Sample usage

python src/data_preprocess.py --config="configs/data_configs/data_config_ISO_AF.yaml" --sat_creds="configs/sat_configs/sat_creds.yaml" --sat_config="configs/sat_configs/sat_config_500x500_60cm.yaml" --clean_neg=True

Outputs

  • Vector outputs are saved to data/vectors/<project_name>/non_school/clean/<iso_code>_clean.geojson.
  • Satellite images are saved to data/rasters/500x500_60cm/<project_name>/<iso_code>/non_school/.

Model Training

To train the computer vision models, run:

sh train.sh

Alternatively, you can run python src/train_cnn.py:

usage: train_cnn.py [-h] [--config MODEL_CONFIG] [--lr_finder LR_FINDER] [--iso ISO [ISO ...]]

Model Training
options:
  -h, --help              show this help message and exit
  --config MODEL_CONFIG   Path to the model configuration file
  --lr_finder LR_FINDER   Learning rate finder (bool, default: False)
  --iso ISO [ISO ...]     ISO 3166-1 alpha-3 codes

Sample usage

python src/train_model.py --config="configs/cnn_configs/convnext_small.yaml" --iso=MNG; 

Outputs

Model results will be saved to exp/<project_name>/<iso_code>_<model_name>/ (e.g. exp/GIGAv2/MNG_convnext_small/)

Model Ensemble

Open configs/best_models.yaml. Add an entry for your country of interest (using the country's ISO code), and specify the best model variants for each ViT, Swin, and Convnext in order of model performance, i.e. the first entry is the best-performing model.

MNG:
- "configs/vit_configs/vit_b_16.yaml"
- "configs/cnn_configs/convnext_base.yaml"
- "configs/vit_configs/swin_v2_b.yaml"

To evaluate the model ensemble, run 05_model_evaluation.ipynb.

CAM Evaluation

To determine the best CAM method, run src/cam_evaluate.py:

usage: cam_evaluate.py [-h] [--model_config MODEL_CONFIG] [--iso_code ISO_CODE]
                       [--percentile PERCENTILE]

CAM Evaluation

options:
  -h, --help                  show this help message and exit
  --model_config MODEL_CONFIG Model config file
  --iso_code ISO_CODE         ISO 3166-1 alpha-3 code
  --percentile PERCENTILE     Percentile (float, default: 90)

Sample usage

python src/cam_evaluate.py --iso_code="MNG" --model_config="configs/best_models.yaml"

Outputs

The output will be saved in exp/<project_name>/<iso_code><best_model_name>/cam_results.csv.

Download Nationwide Satellite Images

To download nationwide satellite images, run src/sat_batch_download.py.

usage: sat_batch_download.py [-h] [--data_config DATA_CONFIG] [--sat_config SAT_CONFIG] [--sat_creds SAT_CREDS] [--iso_code ISO_CODE] [--adm_level ADM_LEVEL] [--sum_threshold SUM_THRESHOLD] [--buffer_size BUFFER_SIZE] [--spacing SPACING]

Satellite Image Download

options:
  -h, --help                    show this help message and exit
  --data_config DATA_CONFIG     Path to the data configuration file
  --sat_config SAT_CONFIG       Path to the satellite configuration file
  --sat_creds SAT_CREDS         Path to the satellite credentials file
  --iso_code ISO_CODE           ISO 3166-1 alpha-3 code
  --adm_level ADM_LEVEL         Administrative level (string, default ADM2)
  --sum_threshold SUM_THRESHOLD Pixel sum threshold (int, default 5)
  --buffer_size BUFFER_SIZE     Buffer size (int, default 150)
  --spacing SPACING             Sliding window spacing (int, default 150)

Sample usage

python src/sat_batch_download.py --data_config="configs/data_configs/data_config_ISO_AS.yaml" --sat_config="configs/sat_configs/sat_config_500x500_60cm.yaml" --sat_creds="configs/sat_configs/sat_creds.yaml" --iso_code=MNG;

Outputs

The satellite images are saved to output/<iso_code>/images/.

Nationwide Model Deployment

For model prediction, run python src/sat_predict.py:

usage: sat_predict.py [-h] [--data_config DATA_CONFIG] [--model_config MODEL_CONFIG] [--sat_config SAT_CONFIG] [--sat_creds SAT_CREDS] [--shapename SHAPENAME] [--iso_code ISO_CODE]

Model Prediction

options:
  -h, --help                    show this help message and exit
  --data_config DATA_CONFIG     Data config file
  --model_config MODEL_CONFIG   Model config file
  --sat_config SAT_CONFIG       Maxar config file
  --sat_creds SAT_CREDS         Credentials file
  --shapename SHAPENAME         Model shapename
  --iso_code ISO_CODE           ISO 3166-1 alpha-3

Sample usage

python src/sat_predict.py --data_config="configs/data_configs/data_config_ISO_AF.yaml" --model_config="configs/best_models.yaml" --sat_config="configs/sat_configs/sat_config_500x500_60cm.yaml" --sat_creds="configs/sat_configs/sat_creds.yaml" --iso_code=RWA;

Outputs

The outputs are saved to output/<iso_code>/results/<project_name>/cams/<iso_code>_<best_model_name>_<cam_method>.geojson.

File Organization

The datasets are organized as follows:

data
├── rasters
│   ├── maxar
│   │   ├── ISO
│   │   │   ├── school
│   │   │   │    ├── UNICEF-ISO-SCHOOL-00000001.tiff
│   │   │   │    └── ...
│   │   │   ├── non_school
│   │   │   │    ├── UNICEF-ISO-NON_SCHOOL-00000001.tiff
│   │   │   │    └── ...
│   │   │   └── ...
│   │   └── ...
└── vectors
│   ├── school
│   │   ├── unicef
│   │   │   ├──ISO_unicef.geojson
│   │   │   └── ...
│   │   ├── osm
│   │   │   ├──ISO_osm.geojson
│   │   │   └── ...
│   │   ├── overture
│   │   │   ├──ISO_overture.geojson
│   │   │   └── ...
│   └── non_school
│       ├── osm
│       │   ├──ISO_osm.geojson
│       │   └── ...
│       └── overture
│           ├──ISO_overture.geojson
│           └── ...
output
├── ISO
│   ├── geotiff
│   ├── images
│   ├── results
│   │     └──<project_name>
│   │        ├── cams
│   │        └── tiles
│   └── tiles
└── ...   

Contribution Guidelines

Thank you for considering contributing to Giga! We value your input and aim to make the contribution process as accessible and transparent as possible. Whether you're interested in reporting bugs, discussing code, submitting fixes, proposing features, becoming a maintainer, or engaging with the Giga community, we welcome your involvement.

Click here for detailed Contribution Guidelines

Code of Conduct

At Giga, we're committed to maintaining an environment that's respectful, inclusive, and harassment-free for everyone involved in our project and community. We welcome contributors and participants from diverse backgrounds and pledge to uphold the standards.

Click here for detailed Code of Conduct

Contact

Applied Science AI-enabled School Mapping Team:

Giga Website: https://giga.global/contact-us/

Acknowledgments💜

Global high-resolution satellite images (60 cm/px) from Maxar made available with the generous support of the US State Department. We are also grateful to Dell for providing us with access to High Performance Computing (HPC) clusters with NVIDIA GPU support.