-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specs approach for AMOF files #17
Comments
Update through commit 65cdf5c:
In general, these product specific spec checks work, e.g.
and
In the second case, the variables that checksit reports as not existing do not exist in the file but should not necessarily be in the file - while they are valid variables for that data product, the The |
make_specs.py fixed in commit d527361, with updated spec file for the surface-met product. |
Added section to check.py which, if
Doesn't currently add in the global attributes spec file because that has not been created correctly/in a useful manner. Also still need to work on product-specific attributes. I've not tested this against a non-netCDF file, but I have tested against a non-AMOF netCDF file, which is correctly ignored in the AMOF-checking process. |
One issue with global attributes spec was the |
In the general global attributes, there are two that have vocab checks -
To help toward this, commit 6c5b8a0 introduces the idea of wildcard to cvs.py. The wildcard can be used in specifying the vocab check, however at the moment it will only work if the wildcard is the last argument in the check. This means that |
Commit 1d75c78 allows for wildcard to appear anywhere in specs vocab check.
A (hopefully not too confusing) generic example to help (mostly me) understand, considering the very generic dictionary {'a' : {'b' : 'c', 'd' : {'e' : 'f', 'g' : 'h'} },
'i' : {'b' : 'j', 'd' : {'e' : 'k', 'g' : 'l'} },
'm' : {'b' : 'n', 'd' : {'e' : 'o', 'g' : 'p'} } } The following checks get the (also) following results:
|
Final issue that needs dealing with for AMOF spec checks (I think) - how to deal with product-specific variables that are not in the netCDF file? The AMOF standard allows for these variables to not be there, but at the moment checksit expects all possible variables to be present. |
Great work @joshua-hampton, I wonder if we should have a way of specifying that any component (i.e. a dimension, variable or an attribute) could be optional. If so, we just need a nice clean way of identifying them. |
Couple of thoughts on how we could do that:
|
Another thought - global attributes |
Idea on applying optional status to things in spec checks (commit c4fe779). When making spec files for data products, variables and dimensions being checked have |
@joshua-hampton: sounds good 👍 |
Commit 31a0209 - Added "warnings" as a returned parameter from all functions in generic.py. When optional product-specific dimensions or variables are missing, a warning is added, which passes back up the chain through specs.py to check.py, where they are printed out along with errors/file compliance, e.g.
I've also added a flag to the
|
@joshua-hampton: I think all structural rules like this should be referred to Barb. |
Testing this with a file that isn't mine has produced some results of note:
|
@joshua-hampton: I really like your ideas for giving people hints on what they might have meant. |
I've found another issue, when doing a rule check on a global attribute. Specifically, this one from
In one netCDF file, the attribute was spelt incorrectly, and |
Played around a bit with the idea of giving hints on potentially misspelt items (da93da7). If a variable/attribute/dimension specified in the spec file is not found in the netCDF file, a function is called that takes the name of what should be there, produces a set of possible "close matches" defined as up to two errors, and then looks through these close matches to see if any of them match anything in the file. For example, if a file had the variable
There are a few limitations on this method. Firstly, I've assumed that the first character will be correct. This is mostly because you could otherwise get
Secondly, if there are similar variable/attribute/dimension names intended to be in the file, they can come up as suggestions. For example, the
The functions that are doing this are currently in |
Commit 843444c - changed specs to check values of variable attributes. Errors such as
are now caught. |
Also found a bug when logging in "compact" mode, checksit would attempt to run template check regardless of whether one was required or not (e.g. AMOF netCDFs only using specs). This is fixed in 15183b0 |
CLI option to skip spellchecking added in 51f7f8e. Creating a pull request to main with latest changes |
Something that still needs doing, how to work with newer releases of NCAS-GENERAL standard. This will require:
|
As with issue #16, I have also made some progress on the idea of using specs to check AMOF files. In this approach, I have created a number of spec files, one that looks at land variables, and one that does common global attributes.
To accommodate this, I made some changes to generic.py, where the functions run by the spec checker live
check_global_attrs
: added regex checkscheck_var_exists
andcheck_dim_exists
. Quite self-explanatoryThis approach doesn't (currently) check the values of variable attributes, just that they exist. This is better suited for checking
valid_min
(compare with discussion in issue #16), but not for checkingunits
, for example.I think this approach will also work well when all variables have the same attributes; however, that's not the case (e.g. quality control variables have
flag_values
andflag_meanings
, which don't exist in other variables). I guess that the specs can be written to do checks on QC variables separately to other variables.Example of running check against these spec files:
The text was updated successfully, but these errors were encountered: