Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoJSON publication format: consider a single file that contains nodes and spans #207

Closed
duncandewhurst opened this issue Dec 1, 2022 · 9 comments · Fixed by #266
Closed
Assignees
Labels
GeoJSON format This issue relates to the GeoJSON publication format Tooling This issue relates to tooling
Milestone

Comments

@duncandewhurst
Copy link
Collaborator

From Open-Telecoms-Data/cove-ofds#5 (comment):

I'm wondering whether revisiting the design choice to have separate files for spans and nodes might reduce the complexity of the conversion tooling.

cc @odscjames @lgs85

@duncandewhurst duncandewhurst added GeoJSON format This issue relates to the GeoJSON publication format Tooling This issue relates to tooling labels Dec 1, 2022
@odscjames
Copy link
Collaborator

I'm interested in sitting down and seeing if we can come up with rules for "given this Feature, is it most likely to be a Node or a Span? I'll continue validating and processing the feature based on that guess". If we can crack that, then we could have 1, 2 or multiple GeoJSON files. People could choose to run JSON->GeoJSON and get any number of files, splitting by things like network, phase or any other field. We would reduce the scope for errors when getting input files (eg Open-Telecoms-Data/lib-cove-ofds#21 )

@duncandewhurst
Copy link
Collaborator Author

If we can crack that, then we could have 1, 2 or multiple GeoJSON files. People could choose to run JSON->GeoJSON and get any number of files, splitting by things like network, phase or any other field

That sounds good, but only .id is required on both Node and Span so I don't think there's a reliable way that we can 'guess' at whether a feature is a node or a span. However, we could consider adding a field for this information in the GeoJSON publication format.

@odscjames
Copy link
Collaborator

My first thought was there would be something we could do, but now I look again .....

  • Geometry (Point/LineString) - complicated by the fact there might be no Geometry
  • Fields we should expect to be there or missing (Span should have start / end, Node should not) - complicated by the fact fields are optional and additional fields are allowed

I think for reliable 100% rules, we will need to add a field.

@duncandewhurst duncandewhurst added this to the 0.3.0 milestone Jan 9, 2023
@duncandewhurst
Copy link
Collaborator Author

Feedback from the World Bank's infrastructure map team is that GeoJSON is their preferred format for ingesting data because of the simplicity of it being a single file so I think we do want to pursue making the GeoJSON format a single file.

@duncandewhurst
Copy link
Collaborator Author

I think for reliable 100% rules, we will need to add a field.

Agreed, I think that this is the approach that we'll pursue.

@duncandewhurst duncandewhurst self-assigned this Apr 26, 2023
@duncandewhurst
Copy link
Collaborator Author

duncandewhurst commented Apr 26, 2023

Looking into this further I'm unsure whether a single GeoJSON file is the right approach. A single file would present problems for users of at least QGIS (and any other software that uses GDAL) and ArcGIS:

  • In QGIS, if the features in a feature collection have different schema of properties, the GDAL GeoJSON driver generates a union of the properties from all the features and applies that to all the features. Since nodes and spans in OFDS have different schema, that means that fields which only belong to spans in OFDS appear as attributes of nodes once the data is loaded into QGIS, and vice-versa. For example, in QGIS, nodes end up with attributes like start, end and darkFibre, which are actually only attributes of spans in OFDS. Those attributes are set to NULL, but I think it could be quite confusing for users, and leads to an attribute table with many columns.
  • In ArcGIS Pro, a feature class must be composed of features of the same feature type, e.g. only Point or only LineString. Users must specify the feature type when using the JSON to Features tool to create a feature class from a GeoJSON file. Only the features of the specified type will be loaded. Therefore users would need to load the data twice, once as a Point feature class to get the nodes and once as a LineString feature class to get the spans. Similar restrictions apply to other ArcGIS products and tools, e.g. ArcPy FeatureSets and the ArcGIS Maps SDK for JavaScript. The only ArcGIS product for which I could find any information about the handling of properties in GeoJSON was ArcGIS Velocity, which requires features to have consistent properties, but, based on the above, I suspect there may be similar issues those found in the GDAL driver.

One thing that feels wrong about the current GeoJSON publication format, is the lack of a means to specify which file contains the nodes feature collection and which contains the spans feature collection. In the example data, we handle this by naming the file (nodes.geojson or spans.geojson) and in CoVE we handle it by providing separate upload widgets for the nodes and spans files. However, neither of those approaches are ideal since publishers might want to name their files differently and CoVE needs to handle the case of users uploading the wrong file. As an alternative to a single GeoJSON file, we could improve the current situation by adding a foreign member at the top level of each feature collection to specify whether it contains nodes or spans, e.g.

{
  "type": "FeatureCollection",
  "featureType": "nodes",
  "features": []
}

@odscjames @stevesong what do you think?


The following are my initial notes on the changes that we'd need to make to have a single GeoJSON file, before I came across the issues documented above. We have two decisions to make about the field that we need to add to indicate whether a feature is a node or span:

Where to add the field

There are two options :

  1. As a child of the properties member of the feature object.
  2. As a foreign member at the top level of the feature object.

Based on the following note from the GeoJSON specification, I think option 1. is preferable:

(...) support for foreign members can vary across implementations, and no normative processing model for foreign members is defined. Accordingly, implementations that rely too heavily on the use of foreign members might experience reduced interoperability with other implementations.

If we pursue option 1, we'll need to add a propertyNames regex to the Node and Span definitions to prevent publishers from adding additional fields with clashing names.

What to name the field

If we add the field as a child of properties member, we can't use type because there is already a type field for nodes. Therefore, I propose featureType.

@odscjames
Copy link
Collaborator

I haven't looked at QGIS etc as much as you so don't have more to add there. That is very interesting.

The other option is to carry on with the plan to add a field on each feature object and then allow people flexibility to work with one or more files as they choose.

Cove/tooling by default could output 2 files "nodes"/"spans" but give the users options to choose 1 file or even more files (so they didn't get any single file that's to big, or get separate files filtered by a variable they choose like status).

Cove/Tooling could accept 1 to many files as input and any file could go in any input so users wouldn't have to worry about putting the right file in the right place.

If someone ended up with one file when they needed it split into 2 (because of QGIS etc) there could be a tool to split the file for them.

We don't need to do all this tooling straight away; we can start by leaving everything working on 2 files, but putting the new field on the feature level instead of on the file level enables us to do this later.

@duncandewhurst
Copy link
Collaborator Author

Good points :-) I agree that adding the field at the feature level is the best option for future flexibility. We can then update the documentation to make separate files a 'should' rather than a 'must' and update CoVE to rely on the new field rather than which file is uploaded in which box. Sound good?

@odscjames
Copy link
Collaborator

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GeoJSON format This issue relates to the GeoJSON publication format Tooling This issue relates to tooling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants