Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add container build/deploy job and improve run pattern #31

Merged
merged 52 commits into from
Jul 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
5f32496
add container build/deploy job
Jun 20, 2023
4c1207b
upstream (openstates-core) is still 3.9, so we'll stick with that
Jun 20, 2023
d0dbb77
correct typo, update tippecanoe version and remove some packages befo…
Jun 20, 2023
2ea394b
better build of geo
Jun 20, 2023
15d7126
Remove post-redistricting warning
Jun 20, 2023
100c63b
remove building DATABASE_URL env
Jun 20, 2023
ae87c0a
increase zoom back to what it was
Jun 20, 2023
cada6aa
correct ids for territories
Jun 20, 2023
c8e90f7
add version file for pyenv
Jun 20, 2023
8c888d5
update dependencies
Jul 12, 2023
b044c91
one script to execute all steps
Jul 12, 2023
e4b22f8
add missing files
Jul 12, 2023
f0bd512
more functions into utils
Jul 12, 2023
0fd48b3
fix some import issues
Jul 12, 2023
4b337fa
set environment variable correctly
Jul 12, 2023
a374f95
update openstates dep
Jul 12, 2023
e9f7cbb
possibly working Mapbox API calls?
Jul 12, 2023
123f4d7
fix linting
Jul 12, 2023
2cf276d
actually call new function
Jul 12, 2023
5402c75
actually check for missing env vars and make bulk upload access more …
Jul 13, 2023
6978120
update docs to reflect new environment variables
Jul 13, 2023
48cee0e
working migrations
Jul 13, 2023
5eddae8
don't try to download files that don't exist
Jul 13, 2023
0e7caaa
add missing files
Jul 13, 2023
2f42c6c
remove un-needed files and don't check for shell when we have no shel…
Jul 13, 2023
b7a1cc9
remove git after build
Jul 13, 2023
2b958f7
add comment explaining removal
Jul 13, 2023
6b56d1c
correct docker command
Jul 13, 2023
77ee10b
follow Mapbox API examples a little more closely
Jul 13, 2023
eb50612
fix linting
Jul 13, 2023
adc57da
fix docker build
Jul 14, 2023
576b73c
update openstates
Jul 14, 2023
8074ec3
make sure we don't have transient AWS tokens messing up steps
Jul 14, 2023
e632b10
updates to documentation
Jul 17, 2023
8176420
handle env keys not being there
Jul 18, 2023
d8b85ad
small updates that don't matter
Jul 18, 2023
417b62a
don't re-generate tiles if they exist
Jul 18, 2023
daba6bd
remove db settings
Jul 18, 2023
c5aebb3
start working out how to remove old divisions
Jul 18, 2023
c011e4e
cleanup command works correctly, use absolute paths
Jul 18, 2023
de33fda
fix linting
Jul 18, 2023
819f42f
remove duplicate path formatting
Jul 18, 2023
225d15f
update exceptions and actually delete
Jul 18, 2023
c543e36
better handling of password for psql test and slightly better logging
Jul 18, 2023
bb8b35c
actually add bounds as required by docs
Jul 18, 2023
c920255
fix linting and slightly more typing
Jul 18, 2023
3f9dc18
correct names of tilesets
Jul 18, 2023
89c8788
fix small bugs for full territory support (TIGER doesn't provide sldu…
Jul 18, 2023
dbc6f7a
remove un-supported chambers from AS
Jul 18, 2023
50a7631
more logging during mapbox upload
Jul 18, 2023
1fd594b
update readme to give some tips on using environment files
Jul 19, 2023
098b03f
actually delete old divisions
Jul 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,6 @@ jobs:
run: poetry run flake8 .
- name: black check
run: poetry run black --check --diff .
- name: ensure shell scripts are formatted
run: |
while IFS= read -r -d '' filename; do
echo "processing ${filename}"
docker run --rm -v "$(pwd):/mnt" koalaman/shellcheck:latest -x "${filename}"
done < <(find . -type f -name "*.sh" ! -path "*/.git/*" -print0)
- name: Yaml file linting
run: |
docker run --rm \
Expand Down
27 changes: 27 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Build and push Docker images
on:
push:
branches:
- main
tags:
- '*'
jobs:
publish:
steps:
- uses: actions/checkout@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Docker Login
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: build docker image
uses: docker/build-push-action@v3
with:
tags: "openstates/geo:latest,openstates/geo:${{ github.ref_name }}"
platforms: amd64,arm64
push: true
runs-on: ubuntu-latest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ final-geojson
endpoint/*.zip
endpoint/awslambda-psycopg2
.idea/
testing/divisiondata
.env-file
env-file
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.9.15
19 changes: 11 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.10-slim
FROM python:3.9-slim

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
Expand All @@ -13,8 +13,8 @@ RUN apt-get update -qq \
libsqlite3-dev \
zlib1g-dev
RUN pip install --disable-pip-version-check --no-cache-dir wheel \
&& pip install --disable-pip-version-check --no-cache-dire crcmod poetry
RUN git clone https://github.com/mapbox/tippecanoe.git && \
&& pip install --disable-pip-version-check --no-cache-dir crcmod poetry
RUN git clone https://github.com/felt/tippecanoe.git && \
cd tippecanoe && \
make -j && \
make install
Expand All @@ -24,14 +24,17 @@ COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --only=main --no-root

COPY scripts /opt/openstates-district-maps
COPY djapp .
COPY manage.py .
COPY make-tiles.sh .
COPY utils utils/
COPY configs configs/
COPY djapp djapp/
COPY generate-geo-data.py .

RUN poetry install --only=main \
&& rm -r /root/.cache/pypoetry/cache /root/.cache/pypoetry/artifacts/ \
&& DEBIAN_FRONTEND=noninteractive apt-get remove -yqq build-essential libsqlite3-dev zlib1g-dev git \
&& DEBIAN_FRONTEND=noninteractive apt-get autoremove -yqq \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

CMD ["bash", "/opt/openstates-district-maps/make-tiles.sh"]
# We use --clean-source here to ensure we don't accidentally run against messy data somehow
CMD ["poetry", "run", "python", "generate-geo-data.py", "--run-migrations", "--upload-data", "--clean-source"]
73 changes: 41 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,22 @@ Generate and upload map tiles for the state-level legislative district maps on [

We download our shapefiles from [census.gov](https://www2.census.gov/geo/tiger).

The organization of files within TIGER's site means that we may have to change the layout of downloaded files from year to year (in `scripts/get-shapefiles.py`). As long as we consistently add proper files into `data/source_cache` for the rest of the scripts to process, changing the initial download location shouldn't matter.
The organization of files within TIGER's site means that we may have to change the layout of downloaded files from year to year (in `utils/tiger.py`). As long as we consistently add proper files into `data/source_cache` for the rest of the scripts to process, changing the initial download location shouldn't matter.

See Appendix A below on Geographic Data Sources for more context.

You'll probably want to remove any cached files in `./data/`. The download tool may try to re-use cached files from the wrong year if they still exist. (We don't manually remove these files because you may need to re-run the scripts, and skipping downloads is useful)

### National Boundary Update

`config/settings.yml` holds the `BOUNDARY_YEAR` config. This setting defines what to apply to our US boundary template link:

```python
f"{TIGER_ROOT}/GENZ{boundary_year}/shp/cb_{boundary_year}_us_nation_5m.zip"
```

We should verify/update this setting to the most recently available boundary year whenever we run geo data.

### Note on file naming

You'll see many files with names like `sldu`, `sldl` or `cd` during this process. Here is a quick layout of what those file name abbreviations mean:
Expand All @@ -49,42 +59,43 @@ There are several steps, which typically need to be run in order:

- `poetry install`

2) Download SLD shapefiles:

- `poetry run ./scripts/get-shapefiles.py`
- Note that this script does not fail on individual download failures. If you see failures in the run, make sure they are expected (e.g. NE/DC lower should fail)

3) Convert to geojson with division IDs:

- `poetry run ./scripts/to-geojson.py`

4) Make sure `DATABASE_URL` is set correctly in `djapp/geo/settings.py` (pointing at either the `geo` database in production or to a local copy, e.g. `DATABASE_URL=postgis:/<user>:<password>@<db_host>/geo`)
2 ) Make sure environment variables are set correctly:

5) Migrate database to add needed tables:
- `DATABASE_URL`: pointing at either the `geo` database in production or to a local copy, e.g. `DATABASE_URL=postgis://<user>:<password>@<db_host>/geo`
- `MAPBOX_ACCESS_TOKEN`: a API token for Mapbox with permissions to upload tilesets
- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`: AWS credentials to upload bulk versions of geo data

- `DATABASE_URL=... poetry run ./manage.py migrate`
3) Download and format geo data:

6) Import into database:

- `DATABASE_URL=... poetry run ./manage.py load_divisions`

7) Convert to mbtiles and upload:
- `poetry run python generate-geo-data.py --run-migrations --upload-data`
- Note that this script does not fail on individual download failures. If you see failures in the run, make sure they are expected (e.g. NE/DC lower should fail)

- `./scripts/make-tiles.py`
### Setting up environment variables

8) Currently, we have to manually upload the resulting tilesets to [Mapbox Studio](https://studio.mapbox.com/tilesets/).
There are plenty of ways to set environment variables, but quick way to manage many environment variables is with an "environment file". e.g.

- We'll need to upload `data/sld.mbtiles` and `data/cd.mbtiles`.
```bash
AWS_ACCESS_KEY_ID="user"
export AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY="test"
export AWS_SECRET_ACCESS_KEY
MAPBOX_ACCESS_TOKEN="token"
export MAPBOX_ACCESS_TOKEN
DATABASE_URL="postgis://openstates:openstates@localhost:5405/openstatesorg"
export DATABASE_URL
```

9) Create district boundary files and upload to S3
After that, we can easily load the file:

- `poetry run python scripts/upload-bulk-boundary-files.py`
```bash
. env-file
```

### Running within Docker

Instead of setting up your local environment you can instead run using Docker. Using Docker Compose will still allow you to access all intermediate files from the processing, within your local `data` directory.

Build and run with Docker Compose. Similar to running without Docker, the `MAPBOX_ACCOUNT` and `MAPBOX_ACCESS_TOKEN` must be set in your local environment.
Build and run with Docker Compose. Similar to running without Docker, environment variables must be set in your local environment.

```
docker-compose up make-tiles
Expand All @@ -99,22 +110,21 @@ button in the toolbar, and then select a district. Metadata should appear in the

## US Census


### Redistricting

"We hold the districts used for the 2018 election until we collect the postcensal congressional and state legislative district plans
for the 118th Congress and year 2022 state legislatures" [US Census CD/SLD note](https://www.census.gov/programs-surveys/geography/technical-documentation/user-note/cd-sld-note.html)
During the next major sessions after a Census (e.g. 2022 was the major session for _most_ jurisdictions after the 2020 Census), the TIGER data we rely on may be significantly "behind" reality as the example note from 2022 indicates:

> "We hold the districts used for the 2018 election until we collect the postcensal congressional and state legislative district plans
> for the 118th Congress and year 2022 state legislatures" [US Census CD/SLD note](https://www.census.gov/programs-surveys/geography/technical-documentation/user-note/cd-sld-note.html)

As of 2022, TIGER was still the most consistent data source for district boundaries we were able to find.

### US Census: TIGER

Files in the TIGER data source are organized according to
[Federal Information Processing System (FIPS)](https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt) codes.
Each numeric code corresponds to a US state (or other levels). For example `01` represents Alabama.

As of 12/30/22 the [TIGER page states](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html):
"All legal boundaries and names are as of January 1, 2022. Released September 30, 2022." So it seems like post-redistricting
shapefiles are not available.

#### TIGER SLDL

[2022](https://www2.census.gov/geo/tiger/TIGER2022/SLDL/)
Expand All @@ -132,4 +142,3 @@ This contains data, including shapefiles, about State Legislative Districts in U
[2022](https://www2.census.gov/geo/tiger/TIGER2022/CD/)

This contains data, including shapefiles, about Congressional Districts.

7 changes: 7 additions & 0 deletions configs/jurisdictions/as.yml
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
name: American Samoa
os-id-prefix: "ocd-division/country:us/territory:as"
id-mappings:
cd:
sld-match: 'cd-60(\d+)'
ignored_chambers:
- sldl
- sldu
2 changes: 2 additions & 0 deletions configs/jurisdictions/dc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ id-mappings:
sld-match: 'ward-1100(\d)'
cd:
sld-match: 'cd-11([\d]+)'
ignored_chambers:
- sldl
7 changes: 7 additions & 0 deletions configs/jurisdictions/gu.yml
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
name: Guam
os-id-prefix: "ocd-division/country:us/territory:gu"
id-mappings:
cd:
sld-match: 'cd-66(\d+)'
ignored_chambers:
- sldl
- sldu
8 changes: 5 additions & 3 deletions configs/jurisdictions/mp.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: Northern Mariana Islands
os-id-prefix: "ocd-division/country:us/territory:mp"
id-mappings:
sldu:
key: SLDUST
sld-match: 'sldu-1100([\d])'
cd:
sld-match: 'cd-69(\d+)'
ignored_chambers:
- sldl
- sldu
2 changes: 2 additions & 0 deletions configs/jurisdictions/ne.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ id-mappings:
cd:
key: DISTRICT
sld-match: 'cd-310(\d)'
ignored_chambers:
- sldl
9 changes: 5 additions & 4 deletions configs/jurisdictions/vi.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
name: Virgin Islands
os-id-prefix: "ocd-division/country:us/territory:vi"
id-mappings:
sldl:
key: SLDUST
sldu:
key: SLDUST
cd:
sld-match: 'cd-78(\d+)'
ignored_chambers:
- sldl
- sldu
11 changes: 10 additions & 1 deletion configs/settings.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
BOUNDARY_YEAR: "2021"
BOUNDARY_YEAR: "2022"

SKIPPED_GEOIDS: []
# cd-6098: American Samoa
Expand All @@ -14,3 +14,12 @@ MTFCC_MAPPING:
FIPS_NAME_MAP:
"69": "COMMONWEALTH_OF_THE_NORTHERN_MARIANA_ISLANDS"
"78": "UNITED_STATES_VIRGIN_ISLANDS"

bucket: data.openstates.org
# you may not want to change this frequently. It is the congressional session marked when the files are created, not necesarily the current session
congress_session: 118

run_migrations: False
upload_data: False
create_tiles: True
clean_source: False
31 changes: 31 additions & 0 deletions djapp/geo/management/commands/clean_divisions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import glob
import json
from pathlib import Path
from django.core.management.base import BaseCommand
from ...models import Division

ROOTDIR = Path(__file__).parent.parent.parent.parent.parent.absolute()


class Command(BaseCommand):
def handle(self, *args, **options):
print("Checking for any divisions we should remove...")
ocd_ids = []
for filename in glob.glob(f"{ROOTDIR}/data/geojson/*.geojson"):
obj = json.load(open(filename, "r"))
ocd_ids.extend(div["properties"]["ocdid"] for div in obj["features"])
print(f"Loaded {len(ocd_ids)} local divisions for comparison")
if len(ocd_ids) < 1:
raise Exception("No local divisions found")

to_delete = [div for div in Division.objects.exclude(id__in=ocd_ids)]

print(f"Found {len(to_delete)} divisions not stored locally")
if len(to_delete) > len(ocd_ids):
raise Exception("Found more objects to delete than expected objects")

# delete command to remove old divisions
# don't try to delete when there aren't any additional divisions
if to_delete:
print(f"Deleting {len(to_delete)} divisions from DB")
Division.objects.exclude(id__in=ocd_ids).delete()
7 changes: 5 additions & 2 deletions djapp/geo/management/commands/load_divisions.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import glob
from pathlib import Path
from django.core.management.base import BaseCommand
from django.contrib.gis.gdal import DataSource
from django.contrib.gis.utils import LayerMapping, LayerMapError
from ...models import DivisionSet, Division


ROOTDIR = Path(__file__).parent.parent.parent.parent.parent.absolute()
GEOJSON_MAPPING = {
"id": "ocdid",
"state": "state",
Expand All @@ -25,7 +26,9 @@ def handle(self, *args, **options):
DivisionSet.objects.get_or_create(slug="sldu")
DivisionSet.objects.get_or_create(slug="cd")

filenames = options["filenames"] or sorted(glob.glob("data/geojson/*.geojson"))
filenames = options["filenames"] or sorted(
glob.glob(f"{ROOTDIR}/data/geojson/*.geojson")
)

for filename in filenames:
print(f"processing {filename}...")
Expand Down
2 changes: 1 addition & 1 deletion djapp/urls.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from django.urls import path
from django.urls import path # noqa: F401

urlpatterns = []
3 changes: 2 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ services:
build:
context: .
args:
- MAPBOX_ACCOUNT
- MAPBOX_ACCESS_TOKEN
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- DATABASE_URL
volumes:
- ./data:/opt/openstates-district-maps/data
Loading