Consolidate geo matching of hospitals #14

mradamcox · 2020-05-14T18:48:32Z

We have added a geojson file to the repo that shows all of the hospital locations to date. I have tested it and can confirm that the data is complete as of this writing, but we still haven't fully integrated it into the workflow, so that's what this ticket is about.

Background: Each row in an input CSV has the name of a hospital in it, and that name has to be matched to a lat/long/street address, etc which is then written to the output CSV. The input CSVs do come with some of this information for some of the hospitals, but it is not reliable so we disregard it.

Currently, the matching process uses two different files, geocode_cache.csv and pa_hospitals and combines them in geo_utils.py HospitalLocations(). Here's an example of where that is ultimately implemented in process_csv: https://github.com/RTCovid/PADataIngestion/blob/master/operators/process_csv.py#L54 (also scroll down to lines 120 and 133) in order to match coordinates to the hospitals based on their name.

Instead of that process, we can consolidate greatly by loading the geojson file, matching a name to each feature, and then taking all of the necessary information from the feature. We recently started using that matching process in the Validator class here:

PADataIngestion/validator.py

Line 39 in bf1197f

def validate_locations(self, input_csv):

. That example also shows the simple pattern in place to handle misspellings or new names for hospitals: A "HospitalNameAliases" field is stored in the GeoJSON that can hold pipe-delimited alternate spellings, and it is parsed in the load_geojson function. The if a name doesn't immediate match one of the features, the list is iterated again.

Completing this ticket will be revamping process_csv to use the new matching method, so that HospitalLocations (and therefore the csv files mentioned above) are no longer needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate geo matching of hospitals #14

Consolidate geo matching of hospitals #14

mradamcox commented May 14, 2020

Consolidate geo matching of hospitals #14

Consolidate geo matching of hospitals #14

Comments

mradamcox commented May 14, 2020