You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have added a geojson file to the repo that shows all of the hospital locations to date. I have tested it and can confirm that the data is complete as of this writing, but we still haven't fully integrated it into the workflow, so that's what this ticket is about.
Background: Each row in an input CSV has the name of a hospital in it, and that name has to be matched to a lat/long/street address, etc which is then written to the output CSV. The input CSVs do come with some of this information for some of the hospitals, but it is not reliable so we disregard it.
Currently, the matching process uses two different files, geocode_cache.csv and pa_hospitals and combines them in geo_utils.py HospitalLocations(). Here's an example of where that is ultimately implemented in process_csv: https://github.com/RTCovid/PADataIngestion/blob/master/operators/process_csv.py#L54 (also scroll down to lines 120 and 133) in order to match coordinates to the hospitals based on their name.
Instead of that process, we can consolidate greatly by loading the geojson file, matching a name to each feature, and then taking all of the necessary information from the feature. We recently started using that matching process in the Validator class here:
. That example also shows the simple pattern in place to handle misspellings or new names for hospitals: A "HospitalNameAliases" field is stored in the GeoJSON that can hold pipe-delimited alternate spellings, and it is parsed in the load_geojson function. The if a name doesn't immediate match one of the features, the list is iterated again.
Completing this ticket will be revamping process_csv to use the new matching method, so that HospitalLocations (and therefore the csv files mentioned above) are no longer needed.
The text was updated successfully, but these errors were encountered:
We have added a geojson file to the repo that shows all of the hospital locations to date. I have tested it and can confirm that the data is complete as of this writing, but we still haven't fully integrated it into the workflow, so that's what this ticket is about.
Background: Each row in an input CSV has the name of a hospital in it, and that name has to be matched to a lat/long/street address, etc which is then written to the output CSV. The input CSVs do come with some of this information for some of the hospitals, but it is not reliable so we disregard it.
Currently, the matching process uses two different files, geocode_cache.csv and pa_hospitals and combines them in geo_utils.py HospitalLocations(). Here's an example of where that is ultimately implemented in process_csv: https://github.com/RTCovid/PADataIngestion/blob/master/operators/process_csv.py#L54 (also scroll down to lines 120 and 133) in order to match coordinates to the hospitals based on their name.
Instead of that process, we can consolidate greatly by loading the geojson file, matching a name to each feature, and then taking all of the necessary information from the feature. We recently started using that matching process in the Validator class here:
PADataIngestion/validator.py
Line 39 in bf1197f
Completing this ticket will be revamping process_csv to use the new matching method, so that HospitalLocations (and therefore the csv files mentioned above) are no longer needed.
The text was updated successfully, but these errors were encountered: