Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncontrolled deletion of metadata when metadata values do not comply with values of selection lists #4588

Closed
andre-hohmann opened this issue Aug 13, 2021 · 12 comments · Fixed by #4959
Assignees
Labels

Comments

@andre-hohmann
Copy link
Collaborator

Problem

In Kitodo.Production 3.x, selection lists can be luckily applied easily. However, during the migration it has been recognized, that most of the metadata in the of a process is deleted, if:

  1. in the metadata element with a value list, a different value is stored
  2. the the metadata editor is closed via "Save & exit" or if the metadata are "saved"

At least, after validation, there is an error message, which metadata elements are affected.

Solution

If different values from a value list are contained in a metadata field, the metadata in the should not be deleted.
Instead, it should always been shown an error message that a non-conforming value is contained. Similar to the behaviour with undefined metadata keys.

@matthias-ronge
Copy link
Collaborator

The behaviour described is intentional. If a metadata key is defined in the ruleset and it has options, then the ruleset defines the permitted range of values here. A value that does not correspond to this value range is therefore invalid. It would be the same if a date field contained a 30th of February.

This means that when you use options, options must be defined for all legal values. You can still use restriction to define that an option should not be selected at one point. That seems to be what you want here. If you do not want this restriction, you must leave the field as a string and not use any options.

Background: Data should be convertible according to the rules of their data type. In the case of enumeration types, every possible value should internally be mappable to its index on that enumeration. If you have an enumeration type with the values “tiny, small, medium, large, huge, giant” then this can be mapped to indices like 1, 2, 3, 4, 5, 6. But if the data reads “average size” now, which index is it, if “average size” is not in the list? The value is illegal as per the data type definition. This would be something that would need to be fixed during import, here, replace “average size” with “medium”.

@matthias-ronge
Copy link
Collaborator

Thinking further: If a combination of both would be required (free-text permitted, plus selection using options), then we would need an additional display type. This is possible with JSF as a so-called combo-box. You have a drop-down selection, but you can also enter free text. But that would be a new feature, because nothing like this has ever existed before.

The ruleset syntax could look like this:

<key id="example">
    <label>Example</label>
    <codomain type="string"/> <!-- explicitly setting the codomain to 'string'
                                   to enable free-text input as well -->
    <option value="one"/>
    <option value="two"/>
    <option value="three"/>
</key>

@andre-hohmann
Copy link
Collaborator Author

Thanks for the explanation. If i understood you correctly, i do not agree completely with you.

We have two use use cases which leads to wrong values in metadata keys with value lists:

  1. Migration: In the migrated metadata, some values are not documented and thus, it must be added to the value list of the metadata key.
  2. KitodoScript: It happend, that a wrong value has been added withe the KitodoScript "addData".

In both cases the goal is to correct the metadata or to supplement the value list. It is reasonable to do this manually and not automatically. Thanks to the validation warning, it is absolutely clear, which value is missing.

However, i do not understand why (nearly) all metadata in the <dmdSec> has to be deleted, when one value does not comply with the value list of the metadata keys.

@Kathrin-Huber
Copy link
Contributor

Thinking further: If a combination of both would be required (free-text permitted, plus selection using options), then we would need an additional display type. This is possible with JSF as a so-called combo-box. You have a drop-down selection, but you can also enter free text. But that would be a new feature, because nothing like this has ever existed before.

The ruleset syntax could look like this:

<key id="example">
    <label>Example</label>
    <codomain type="string"/> <!-- explicitly setting the codomain to 'string'
                                   to enable free-text input as well -->
    <option value="one"/>
    <option value="two"/>
    <option value="three"/>
</key>

Isn't this used in the folder configuration on Project edit? When editing the mets filegroup?

@andre-hohmann
Copy link
Collaborator Author

You can think about adding free-text to a list of option values, but i want to remember that the issue was created, because nearly all metadata is deleted from the METS-file, when the field contains a wrong value.

It is not the intention of the issue to allow free-text fields, but to prevent the deletion of metadata.

@matthias-ronge
Copy link
Collaborator

Isn't this used in the folder configuration on Project edit? When editing the mets filegroup?

Yes, this is what I meant.

It is not the intention of the issue to allow free-text fields, but to prevent the deletion of metadata.

Data is managed in a program in so-called data types. This basically describes what a value can be. For example a number, a text, or a list. (Programmers talk of int, String, or enum, that just means that.) Data types allow different operations, for example you can add numbers (2 + 2 = 4), but you can’t do that for text or lists. You can sort lists according to their internal order (for example, small is smaller than large, regardless of the alphabetical order), which you cannot do with text.

<option> elements in the ruleset create a data type of the list type. Values that don’t have an option are undefined and have no internal representation.
Imagine, temperature is stated as “comfortably warm”. What is −10°C + comfortably warm? That doesn't work because it can't be converted to a number. That's kind of how programs work.

So, the easiest way to prevent data being deleted is to replace the list with text, that is, you comment out all options in the ruleset, but then there is no longer the selection box, just text input, and you must input the value manually. That means you have to write "ger" for the language, for example.

The combo box described above allows both, the data type is text, but you can still have options. Internally it uses both, there is a value as text, but a list is used for the selectable options. It is for better user's convenience. That's why I thought of it.

@andre-hohmann
Copy link
Collaborator Author

andre-hohmann commented Oct 6, 2021

Thanks a lot for the detailed answer. I will apply the change in the ruleset, but i am afraid, that this knowledge will not reach all users.

However, i want to emphasize that i know a lot of reactions regarding wrong data/values:

  • nothing happens at all
  • a error message appears but nothing further happens
  • a error message appears and saving of the object is blocked until the mistake is corrected
  • only the wrong data/value is silently deleted
  • only the wrong data/value is deleted with an additional information about the deletion
  • ...

But i have not experienced a behaviour that deletes all data/values of an object, just because one value does not comply with an strictly defined option list. Even if the probability that this behaviour occurs is very low, it will have very unpleasant consequences.

To come to an end and to focus on other things, i will adjust our ruleset. When the described behaviour does not appear anymore, i will close the issue.

@henning-gerhardt
Copy link
Collaborator

If a combo box is a usable solution here then combo box should be implemented if it is not available yet. But I would prefer a general solution to prevent storing invalid meta data or at least did not corrupt the whole meta data file on storing. I say this as a person who must restore this cases from backup system.

@matthias-ronge
Copy link
Collaborator

But i have not experienced a behaviour that deletes all data/values of an object, just because one value does not comply with an strictly defined option list.

I didn't understand that. Are valid values also deleted? That really shouldn't happen! I thought only invalid values are deleted.

@andre-hohmann
Copy link
Collaborator Author

Yes, nearly the whole dmdSec was empty after saving, but I did not try it often.

I will still try the adjustments of the ruleset, but will not close the issue afterwards. I assume we agree that the behaviour needs further investigation.

@matthias-ronge
Copy link
Collaborator

Can that be an occurrence of #4061 (comment)?

@andre-hohmann
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants