GeoJSON publication format: consider a single file that contains nodes and spans #207

duncandewhurst · 2022-12-01T21:57:04Z

From Open-Telecoms-Data/cove-ofds#5 (comment):

I'm wondering whether revisiting the design choice to have separate files for spans and nodes might reduce the complexity of the conversion tooling.

cc @odscjames @lgs85

odscjames · 2022-12-02T09:20:48Z

I'm interested in sitting down and seeing if we can come up with rules for "given this Feature, is it most likely to be a Node or a Span? I'll continue validating and processing the feature based on that guess". If we can crack that, then we could have 1, 2 or multiple GeoJSON files. People could choose to run JSON->GeoJSON and get any number of files, splitting by things like network, phase or any other field. We would reduce the scope for errors when getting input files (eg Open-Telecoms-Data/lib-cove-ofds#21 )

duncandewhurst · 2022-12-04T20:33:27Z

If we can crack that, then we could have 1, 2 or multiple GeoJSON files. People could choose to run JSON->GeoJSON and get any number of files, splitting by things like network, phase or any other field

That sounds good, but only .id is required on both Node and Span so I don't think there's a reliable way that we can 'guess' at whether a feature is a node or a span. However, we could consider adding a field for this information in the GeoJSON publication format.

odscjames · 2022-12-06T09:08:37Z

My first thought was there would be something we could do, but now I look again .....

Geometry (Point/LineString) - complicated by the fact there might be no Geometry
Fields we should expect to be there or missing (Span should have start / end, Node should not) - complicated by the fact fields are optional and additional fields are allowed

I think for reliable 100% rules, we will need to add a field.

duncandewhurst · 2023-03-22T22:45:39Z

Feedback from the World Bank's infrastructure map team is that GeoJSON is their preferred format for ingesting data because of the simplicity of it being a single file so I think we do want to pursue making the GeoJSON format a single file.

duncandewhurst · 2023-04-02T22:45:49Z

I think for reliable 100% rules, we will need to add a field.

Agreed, I think that this is the approach that we'll pursue.

duncandewhurst · 2023-04-26T22:31:11Z

Looking into this further I'm unsure whether a single GeoJSON file is the right approach. A single file would present problems for users of at least QGIS (and any other software that uses GDAL) and ArcGIS:

In QGIS, if the features in a feature collection have different schema of properties, the GDAL GeoJSON driver generates a union of the properties from all the features and applies that to all the features. Since nodes and spans in OFDS have different schema, that means that fields which only belong to spans in OFDS appear as attributes of nodes once the data is loaded into QGIS, and vice-versa. For example, in QGIS, nodes end up with attributes like start, end and darkFibre, which are actually only attributes of spans in OFDS. Those attributes are set to NULL, but I think it could be quite confusing for users, and leads to an attribute table with many columns.
In ArcGIS Pro, a feature class must be composed of features of the same feature type, e.g. only Point or only LineString. Users must specify the feature type when using the JSON to Features tool to create a feature class from a GeoJSON file. Only the features of the specified type will be loaded. Therefore users would need to load the data twice, once as a Point feature class to get the nodes and once as a LineString feature class to get the spans. Similar restrictions apply to other ArcGIS products and tools, e.g. ArcPy FeatureSets and the ArcGIS Maps SDK for JavaScript. The only ArcGIS product for which I could find any information about the handling of properties in GeoJSON was ArcGIS Velocity, which requires features to have consistent properties, but, based on the above, I suspect there may be similar issues those found in the GDAL driver.

One thing that feels wrong about the current GeoJSON publication format, is the lack of a means to specify which file contains the nodes feature collection and which contains the spans feature collection. In the example data, we handle this by naming the file (nodes.geojson or spans.geojson) and in CoVE we handle it by providing separate upload widgets for the nodes and spans files. However, neither of those approaches are ideal since publishers might want to name their files differently and CoVE needs to handle the case of users uploading the wrong file. As an alternative to a single GeoJSON file, we could improve the current situation by adding a foreign member at the top level of each feature collection to specify whether it contains nodes or spans, e.g.

{
  "type": "FeatureCollection",
  "featureType": "nodes",
  "features": []
}

@odscjames @stevesong what do you think?

The following are my initial notes on the changes that we'd need to make to have a single GeoJSON file, before I came across the issues documented above. We have two decisions to make about the field that we need to add to indicate whether a feature is a node or span:

Where to add the field

There are two options :

As a child of the properties member of the feature object.
As a foreign member at the top level of the feature object.

Based on the following note from the GeoJSON specification, I think option 1. is preferable:

(...) support for foreign members can vary across implementations, and no normative processing model for foreign members is defined. Accordingly, implementations that rely too heavily on the use of foreign members might experience reduced interoperability with other implementations.

If we pursue option 1, we'll need to add a propertyNames regex to the Node and Span definitions to prevent publishers from adding additional fields with clashing names.

What to name the field

If we add the field as a child of properties member, we can't use type because there is already a type field for nodes. Therefore, I propose featureType.

odscjames · 2023-04-27T07:22:06Z

I haven't looked at QGIS etc as much as you so don't have more to add there. That is very interesting.

The other option is to carry on with the plan to add a field on each feature object and then allow people flexibility to work with one or more files as they choose.

Cove/tooling by default could output 2 files "nodes"/"spans" but give the users options to choose 1 file or even more files (so they didn't get any single file that's to big, or get separate files filtered by a variable they choose like status).

Cove/Tooling could accept 1 to many files as input and any file could go in any input so users wouldn't have to worry about putting the right file in the right place.

If someone ended up with one file when they needed it split into 2 (because of QGIS etc) there could be a tool to split the file for them.

We don't need to do all this tooling straight away; we can start by leaving everything working on 2 files, but putting the new field on the feature level instead of on the file level enables us to do this later.

duncandewhurst · 2023-05-01T21:13:06Z

Good points :-) I agree that adding the field at the feature level is the best option for future flexibility. We can then update the documentation to make separate files a 'should' rather than a 'must' and update CoVE to rely on the new field rather than which file is uploaded in which box. Sound good?

odscjames · 2023-05-03T14:08:15Z

Yes!

duncandewhurst added GeoJSON format This issue relates to the GeoJSON publication format Tooling This issue relates to tooling labels Dec 1, 2022

duncandewhurst added this to the 0.3.0 milestone Jan 9, 2023

duncandewhurst self-assigned this Apr 26, 2023

duncandewhurst mentioned this issue May 8, 2023

Add featureType field to GeoJSON publication format #266

Merged

22 tasks

duncandewhurst closed this as completed in #266 May 17, 2023

duncandewhurst mentioned this issue Jun 1, 2023

GeoJSON: Update CoVE to rely on the featureType field rather than which file is uploaded in which box Open-Telecoms-Data/cove-ofds#96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoJSON publication format: consider a single file that contains nodes and spans #207

GeoJSON publication format: consider a single file that contains nodes and spans #207

duncandewhurst commented Dec 1, 2022

odscjames commented Dec 2, 2022

duncandewhurst commented Dec 4, 2022

odscjames commented Dec 6, 2022

duncandewhurst commented Mar 22, 2023

duncandewhurst commented Apr 2, 2023

duncandewhurst commented Apr 26, 2023 •

edited

Loading

odscjames commented Apr 27, 2023

duncandewhurst commented May 1, 2023

odscjames commented May 3, 2023

GeoJSON publication format: consider a single file that contains nodes and spans #207

GeoJSON publication format: consider a single file that contains nodes and spans #207

Comments

duncandewhurst commented Dec 1, 2022

odscjames commented Dec 2, 2022

duncandewhurst commented Dec 4, 2022

odscjames commented Dec 6, 2022

duncandewhurst commented Mar 22, 2023

duncandewhurst commented Apr 2, 2023

duncandewhurst commented Apr 26, 2023 • edited Loading

Where to add the field

What to name the field

odscjames commented Apr 27, 2023

duncandewhurst commented May 1, 2023

odscjames commented May 3, 2023

duncandewhurst commented Apr 26, 2023 •

edited

Loading