-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request Entity Too Large #6
Comments
@zazi, thanks for the bug report. Could reproduce. The DNB endpoint is in general relatively broken. I believe I saw this error before:
It really odd, because even on a daily slice (using the -daily flag) it is too much. If, in theory, all records would have a single timestamp, there would be no way at all to retrieve the records in a windowed fashion - which in turn means that it is not fully OAI compliant. Next thing I would try would be:
We wrote oaicrawl for zvdd.de OAI, because it's calling itself OAI, despite being broken. The oaicrawl is a much blunter tool, it will fetch all identifiers (ListIdentifiers) and request records one-by-one (GetRecord). Let's see what happens with DNB:
Digging into it a bit more: <title>Error</title>Your request matches to many records (>100000). The result size is 13413063. Please try to restrict the request-period. Now, let me rant on a bit. Why does OAI has so-called "resumption-tokens" at all? Datacite, base (Bielefeld) and other huge repositories can work just fine by paging through the data (tens of millions of records) for days. It's a DNB problem, it would be best, if they use their own resources to solve this problem. |
thanks a lot @miku for your very fast reply. I was also on trying oaicrawl for this, but then I thought that it might be a bit to much fetching this rather larger authorities set 1-by-1 from DNB - so I skipped this approach. Furthermore, as far as I understood the arguments from oaicrawl - I cannot define the concrete set over there, or? |
while writing the draft for an answer to DNB and reading their OAI docs again, I came to a possible solution: Does this sound like a solution for you @miku ? PS: the DNB OAI docu also says "Depending on the OAI repository these can be either defined to the day (YYYY-MM-DD) or to the second (YYYY-MM-DDThh:mm:ssZ)" - so working with hourly slice might be possible.
delivers at least some results (incl. a resumption token) |
Yes, oaicrawl was more of a one-shot for a particular endpoint and has a minimal feature set.
I can try to do the same.
Yes, sure this is an option. This is also a limitation of metha, which I would like to get rid of one day (it was not essential for the use cases so far, so it is not implemented): It has only monthly and daily slices, not arbitrary precision. |
Ok, we've send a request to DNB, whether they can increase the result size limit. On the other side, we would appreciate, when you could implement the proposed fall-back functionality, when a 413 will be thrown, i.e., decrease the interval temporarily to hourly (and the go back to daily). |
while trying to harvest authority data from DNB OAI endpoint, I'm getting following error:
any chance to fix this?
this metha-sync call is following:
The text was updated successfully, but these errors were encountered: