Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property Datatypes #45

Open
VladimirAlexiev opened this issue Feb 1, 2022 · 4 comments
Open

Property Datatypes #45

VladimirAlexiev opened this issue Feb 1, 2022 · 4 comments
Assignees
Labels
NDR This is an issue which needs fixing on the transformation/graph creation

Comments

@VladimirAlexiev
Copy link

Currently UNCEFACT uses only two literal datatypes: xsd:string (791 props) and xsd:token (159 props).

UNCEFACT prop names are made according to ISO/IEC 11179 Metadata Registry (MDR), part 5:2015 Naming and identification principles. The last word of prop names (let's call it "kind") suggests many other datatypes.

Surely trade involves some numbers and some dates?!?

I checked that all props with kind Id are xsd:token (good).
This query counts xsd:string props by "kind":

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?kind (count(*) as ?c) {
  ?prop schema:rangeIncludes xsd:string
  bind(replace(str(?prop),".*([A-Z][a-z]*)","$1") as ?kind)
  filter(regex(?kind,"^[A-Z]"))
} group by ?kind order by ?kind

Count and tentative proposed changes:

kind c change to
"Access" 1
"Agency" 1
"Amount" 89 numeric
"Basis" 2
"Box" 1
"Charge" 1
"Code" 154 xsd:token
"Conditions" 1
"Criteria" 1
"Date" 3 xsd:date
"Description" 21
"Dimension" 1
"Five" 1
"Four" 1
"Indicator" 73 xsd:boolean
"Information" 21
"Instructions" 2
"Limit" 2
"List" 2
"Means" 1
"Measure" 66
"Name" 47
"Number" 4 numeric
"Numeric" 15 IndexNumeric, SequenceNumeric -> xsd:integer
"Object" 7
"Of" 2
"One" 1
"Pattern" 1
"Percent" 16 numeric
"Phrase" 1
"Point" 1
"Procedure" 1
"Quantity" 91 numeric
"Rate" 4
"Reason" 7
"Reference" 6
"Remark" 2
"Remarks" 1
"Restriction" 3
"Result" 1
"Status" 1
"Three" 1
"Time" 79 xsd:dateTime
"Title" 1
"Two" 1
"Type" 9
"Use" 1
"Value" 1
"Zone" 1

Examples:

  • Numeric candidates:
    uncefact:usedToDateQuotaQuantity, uncefact:usedSignalSourceQuantity, taxBasisTotalAmount, taxBasisAllowanceRate
  • date or dateTime candidates:
    uncefact:occurrenceDateTime
  • xsd:boolean candidates:
    uncefact:nilCarriageValueIndicator, uncefact:nilCustomsValueIndicator, uncefact:nilInsuranceValueIndicator
@Fak3
Copy link

Fak3 commented Feb 1, 2022

Edi3 issue: edi3/edi3-json-ld-ndr#51

@nissimsan nissimsan added the NDR This is an issue which needs fixing on the transformation/graph creation label Mar 4, 2022
@nissimsan
Copy link
Contributor

It's very hard to disagree on this! :)

@kshychko , we did some work on this - can you double check if this was fixed already, pls?

@nissimsan
Copy link
Contributor

@Fak3 , how are you doing? We'd love to have you back and attend the calls!!! ❤️

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Sep 25, 2023

Currently, all Id properties are rendered as token (good!) and all other data properties as string (not good):

grep rangeIncludes uncefact.ttl |sort|uniq -c|sort -rn|less
    791         schema:rangeIncludes            xsd:string ;  
    159         schema:rangeIncludes            xsd:token ;   

In particular:

  • Indicator properties should be xsd:boolean but are currently string, eg
uncefact:wasteReportingExemptionIndicator
        schema:rangeIncludes            xsd:string ;
  • Numeric properties should be xsd:integer, eg
uncefact:lineCountNumeric
        rdfs:comment                    "The count of the number of lines in this exchanged document." ;
        schema:rangeIncludes            xsd:string ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NDR This is an issue which needs fixing on the transformation/graph creation
Projects
None yet
Development

No branches or pull requests

4 participants