You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! In the end I think I managed to create a working json file, but I decided to still open this issue to detail possible inconsistencies in the Croissant Format Specification.
Our dataset consists of images and two corresponding masks, all in different directories contained in a single zip file.
I decided that the best way to specify records is by creating three different record sets that can be joined using references. Images and masks share the same filename, which can be used to match them. I managed to successfully extract both the images and their filenames.
My problem is that simply including references="images/filename" or references="#{images/filename}" in the fields corresponding to masks' filenames causes AttributeError: 'str' object has no attribute 'uuid'. Is it possible to convert filename strings into objects that have the "uuid" attribute using mlcroissant python API? I did not find any reference for the API aside from the notebooks in recipes, which unfortunately do not feature the references functionality.
I then tried manually adding the keys and references to the json generated by python library according to the specification found here by:
"references": { "@idfield": "images/filename" } to the filename fields of both masks, which causes ValidationError: Source should have one of the following properties http://mlcommons.org/croissant/field or http://mlcommons.org/croissant/fileObject or http://mlcommons.org/croissant/fileSet.
I also tried including "references": {"@id": "images/filename"} as suggested in specs here. This in turn causes AttributeError: '_MISSING_TYPE' object has no attribute 'uuid'.
What worked in the end was adding "references": {"field": {"@id": "images/filename"}}, which was suggested in #651, but is not specified anywhere in the Croissant Format Specification.
The text was updated successfully, but these errors were encountered:
I stumbled onto this issue via search engine. I can confirm that I ran into this exact situation.
None of the options in docs indicate this way of specifying the references property.
I am just getting started with using the croissant library and am trying to make a croissant json with a toy(ish) dataset to explore the possibilities.
Hello! In the end I think I managed to create a working json file, but I decided to still open this issue to detail possible inconsistencies in the Croissant Format Specification.
Our dataset consists of images and two corresponding masks, all in different directories contained in a single zip file.
I decided that the best way to specify records is by creating three different record sets that can be joined using references. Images and masks share the same filename, which can be used to match them. I managed to successfully extract both the images and their filenames.
My problem is that simply including
references="images/filename"
orreferences="#{images/filename}"
in the fields corresponding to masks' filenames causesAttributeError: 'str' object has no attribute 'uuid'
. Is it possible to convert filename strings into objects that have the "uuid" attribute using mlcroissant python API? I did not find any reference for the API aside from the notebooks in recipes, which unfortunately do not feature the references functionality.I then tried manually adding the keys and references to the json generated by python library according to the specification found here by:
"key": { "@id": "images/filename" },
to the images fileset, which causesValidationError: "images/filename" should have an attribute "@type": "https://schema.org/Text". Got http://mlcommons.org/croissant/Field instead.
, and is a known issue ("images/filename" should have an attribute "@type": "https://schema.org/Text". Got http://mlcommons.org/croissant/Field instead. #651 and Implementkey
in mlcroissant. #655)"references": { "@idfield": "images/filename" }
to the filename fields of both masks, which causesValidationError: Source should have one of the following properties http://mlcommons.org/croissant/field or http://mlcommons.org/croissant/fileObject or http://mlcommons.org/croissant/fileSet.
I also tried including
"references": {"@id": "images/filename"}
as suggested in specs here. This in turn causesAttributeError: '_MISSING_TYPE' object has no attribute 'uuid'
.What worked in the end was adding
"references": {"field": {"@id": "images/filename"}}
, which was suggested in #651, but is not specified anywhere in the Croissant Format Specification.The text was updated successfully, but these errors were encountered: