Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed simple reverse geocoder in PostGIS database #1827

Open
spwoodcock opened this issue Oct 19, 2024 · 4 comments
Open

Embed simple reverse geocoder in PostGIS database #1827

spwoodcock opened this issue Oct 19, 2024 · 4 comments
Labels
backend Related to backend code effort:medium Likely a day or two enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed priority:low Backlog of tasks that will be addressed in time

Comments

@spwoodcock
Copy link
Member

spwoodcock commented Oct 19, 2024

Problem

  • Currently we rely on Nominatim (hosted by OSM).
  • This is fine, but it's an external API we rely on causing two issues:
    • Rate limiting if we call it too many times successively.
    • When we run backend or e2e tests we create projects, which generate the location strings via Nominatim. If we have many workflows running the tests, this can spam Nominatim a bit (local tests + CI tests).
  • We don't need super precise geocoding - only city and country.

Solution

  • First get a dataset for cities of the world over 1000 population. This could be via OSM, but there are already curated datasets such as this.
    • Clean the data / do some basic checking
    • Extract the coordinate field into lat & lon fields.
  • Create a table called geocoding with a computed/generated geom field (takes lat/lon and automatically generates a PostGIS geom):
    CREATE TABLE geocoding (
        city varchar,
        country varchar,
        geom geometry(Point,4326) GENERATED ALWAYS AS (ST_SetSRID(ST_MakePoint(lon, lat), 4326)) STORED,
        voronoi geometry(Polygon,4326)
    );
  • Insert CSV data into the table efficiently using psql \copy.
  • Create voronoi polygons based on the points - ideally we get a mosaic of the entire world, built from adjoining city boundaries.
    voronoi
  • Create a geospatial index on the voronois for performance.
  • Now we have our voronois we can have a pre-canned query to get the city/country for a given geometry centroid (point).

The final query could be something like (untested):

SELECT 
    g.city, 
    g.country
FROM 
    geocoding g
WHERE 
    ST_Contains(g.voronoi, ST_GeomFromText('POINT(-72.48496 -13.92862)', 4326));

Additional Considerations

  • Would love if anyone in the community wants to look into this!
  • Once we have a curated dataset of voronois this table can be exported in a format for easy import in a Postgres init script, so we have it available when users run FMTM.
@spwoodcock spwoodcock added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed priority:low Backlog of tasks that will be addressed in time backend Related to backend code effort:medium Likely a day or two labels Oct 19, 2024
@Sujanadh
Copy link
Collaborator

I think it would be best if we create our own package or separate repo for the geocoding so it will be efficient to use , scale it and benefits of using it in other projects too like Tasking Manager.

@spwoodcock
Copy link
Member Author

Good point, this would be a great pypi package!

Taking a quick browse there is already a similar implementation! https://github.com/thampiman/reverse-geocoder

Perhaps we could take this code and update it (or it might even work as it is, despite being so old).

Looks like they bundle a small C++ util? Not sure about this approach. Would love to leverage PostGIS.

@spwoodcock
Copy link
Member Author

Obviously this is a quite common issue to solve.

I love it when I write out an idea and somebody on the internet has already solved the task, to your exact spec 😂
https://github.com/richardpenman/reverse_geocode

We should probably just use that package 👍

@spwoodcock
Copy link
Member Author

spwoodcock commented Oct 21, 2024

On second thought perhaps we just use that as inspiration - I'm not too keen on the implementation!

The package downloads the JSON data and loads it into memory for the query - far from ideal.

We could write an alternative backed by PostGIS and consider a similar k-d tree index like this? (needs more research):

CREATE INDEX [indexname] ON [tablename] USING SPGIST ( [geometryfield] );

https://www.crunchydata.com/blog/the-many-spatial-indexes-of-postgis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code effort:medium Likely a day or two enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed priority:low Backlog of tasks that will be addressed in time
Projects
None yet
Development

No branches or pull requests

2 participants