-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils/image: add polygon_for_parent w/ a test #577
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #577 +/- ##
==========================================
- Coverage 84.14% 83.98% -0.16%
==========================================
Files 49 49
Lines 2776 2792 +16
Branches 541 546 +5
==========================================
+ Hits 2336 2345 +9
- Misses 342 344 +2
- Partials 98 103 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point. I have already stalled #492 (which does the same thing), because like I said this should be addressed at the PAGE model implicitly, rather than by requiring extra calls in every processor. (I thought we agreed on that?)
We have been discussing GT validation of coordinates again today, as well as possible applications for ocrd-sanitize here, that's why I wanted to understand how to use that function. I see this as a step towards the eventual goal of PAGE-level validation/cl;ipping, the adaptions to enable better error reporting in GdsCollector is in the same vein. I just momentarily wasn't aware of #492. Can't we not try to bring the coordinate logic from tesserocr into OCR-D/core first, while working towards PAGE-level methods? |
But the exception was merely an experiment. We don't want that in production code. So not only would we have to add the call to nearly every processor, it would also have to catch the exception each time. That's just too much effort IMO. OCR-D/ocrd_tesserocr#145 already demonstrates that ocrd_tesserocr correctly would have to catch the exception, ignoring the candidate. (I just rewrote with return type Thinking about this from the PAGE model is a different picture, though. Functions like I think you might draw 2 different conclusions from that:
|
After today's tech call, the picture has become even larger. #576 offers a new alley for making implementors aware of these problems – by feeding PAGE validation failures (along with call stack context if necessary) into dedicated loggers. But most of these failures very likely would result in subsequent crashes of the follow-up processors anyway (because we cannot expect to make all of them robust against all sorts of input invalidities, even in the long term). Plus we already face the problem that processors can fail (due to exception or exit) on single pages (often only after computing a long series of pages correctly). This is not tolerable for a system aiming at productive use. So is option 2 above. Therefore, we already need a mechanism of catching a processor's exceptions and falling back to (say) the input page as output page. That very mechanism could then be used to let exceptions fall through when requested. This could be controlled on the command line with a new option (say |
No description provided.