Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with the new import configuration #5219

Closed
BartChris opened this issue Jul 6, 2022 · 8 comments
Closed

Problems with the new import configuration #5219

BartChris opened this issue Jul 6, 2022 · 8 comments

Comments

@BartChris
Copy link
Collaborator

BartChris commented Jul 6, 2022

I have some problems transferring the current xml based configuration to the new import configuration (#5038). Right now i cannot make some catalogs work which worked before. It seems that some of the features got lost which originally ( #3374) allowed for the usage of arbitrary search interfaces. It would be good if those features could be ported to the new interface as well to make use of catalogs which worked before

I have some questions in this context:

  • The xml based configuration had a way to specfiy custom url parameters. This allowed to add parameters to the URL in additon to those which are given by the user query. How can something like that be done in the new interface?
<urlParameters>
    <param name="format" value="oai_mods" />
</urlParameters>
  • What does Query delimiter actually stand for and what has to go there?

grafik

  • Is it ensured that the mapping files are actually applied in the right order if i want to chain multiple transformations?

grafik

  • How can a prefix for the identifier be specfied:

<identifierParameter prefix="prefix:" value="id" />

@solth
Copy link
Member

solth commented Jul 7, 2022

  • The xml based configuration had a way to specfiy custom url parameters. This allowed to add parameters to the URL in additon to those which are given by the user query. How can something like that be done in the new interface?
<urlParameters>
    <param name="format" value="oai_mods" />
</urlParameters>

When the urlParameters element was first added to the kitodo_opac.xml catalog configurations, it was only intended for standard URL parameters like version, operation and recordSchema for SRU interfaces and verb and metadataPrefix for OAI interfaces. For this reason, I removed the option to add arbitrary URL parameters in the ImportConfiguration object and instead replaced it with a defined set of parameters for the individual search interface types (SRU, OAI etc.).

But I see now that having the option to add custom URL parameters is not only useful but even required in some cases, so I think we should try to re-enable this option.

@solth
Copy link
Member

solth commented Jul 7, 2022

  • What does Query delimiter actually stand for and what has to go there?

grafik

The query delimiter is an optional character in which the query part of the URL can be enclosed. This was necessary for some SRU interfaces like the "LfULG - DiGAS" where the query would need to be enclosed in " characters in order for the interface to process the query successfully.

In the kitodo_opac.xml configuration file, this optional delimiter could be configured using the following element:
<queryDelimiter>"</queryDelimiter>

@solth
Copy link
Member

solth commented Jul 7, 2022

  • Is it ensured that the mapping files are actually applied in the right order if i want to chain multiple transformations?

grafik

Yes, the mapping files are saved in a list internally, preserving the order in which they have been assigned to the import configuration and ensuring they are applied in the correct order to the imported metadata file. In fact, an exception will be thrown if you try to save a an ImportConfiguration where

  • the input metadata format of the first of a sequence of mapping files does not correspond to the metadata format of the ImportConfiguration, or
  • the output metadata format of the last file in the sequence of assigned mapping files is not "Kitodo", or
  • the output format of one mapping file in the sequence of mapping files does not correspond to the input format of the next mapping file

@solth
Copy link
Member

solth commented Jul 7, 2022

  • How can a prefix for the identifier be specfied:

<identifierParameter prefix="prefix:" value="id" />

You are right, I missed this feature when implementing the new ImportConfiguration class. I will try to re-add it before the next release.

@BartChris
Copy link
Collaborator Author

BartChris commented Jul 7, 2022

Thank you for your prompt replies. My question about the order of the mapping files came from my test with the SRU interface of the zdb:
https://services.dnb.de/sru/zdb?version=1.1&operation=searchRetrieve&query=zdbid%3D2825456-9&recordSchema=MARC21-xml

In the xml-based configuration i specified two mapping files:

XML config:

grafik

This worked. I cannot make it work with the new configuration. The url is constructed correctly but at some point a parser exception is thrown
ConfigException / XPathException / SAXParseException: Premature end of file.

I therefor assume that something is not working correctly with the mappings.

@BartChris
Copy link
Collaborator Author

BartChris commented Jul 7, 2022

I checked again and the problem seems to have to do with the missing ordering of the entries in the database table importconfiguration_x_mappingfile.

I changed the following by hand in mappingfile:

grafik

to:

grafik

so that the MARC converter comes first. Now everything works.

By doing that the order of the entries in importconfiguration_x_mappingfile has the MARC mapping first and the transformation works correctly:

grafik

We probably need an "order"-column in importconfiguration_x_mappingfile to ensure that the mappings are applied in the correct order at runtime.

@BartChris
Copy link
Collaborator Author

BartChris commented Jul 12, 2022

@solth Another question:

What is the purpose of the xpath configuration for the parent?:

grafik

I had the assumption that the definition, which element in the returned xml form the catalague is the parent element's identifier is controlled by the definition in the ruleset. (By setting the higherlevelIdentifier)

grafik

What is the specific purpose of giving an XPath to the parent element here? Is this used in a different context than the higherlevelIdentifier?

@solth
Copy link
Member

solth commented Jul 15, 2022

What is the specific purpose of giving an XPath to the parent element here? Is this used in a different context than the higherlevelIdentifier?

The Parent element - XPath setting is used to extract the catalog ID of the parent record of the imported record from the imported XML document.

The metadata configured as higherLevelIdentifier in the ruleset defines the internal metadata field in which the parent ID is saved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants