-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repair fix coords #43
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this PR in my test for OCR-D/ocrd_tesserocr#149, and it seems to work.
@bertsky, you are faster with pushing new commits to existing pull requests than I am able to test them. I suggest to wait a little with new commits until the existing pull request was merged and make a new one then (unless there is an urgent need for the new commit of course). |
@kba (or whoever has merge rights), can this PR be merged? I'd like to have a working |
Sorry, there are so many related issues at the moment and the shared problem is always our Shapely code. Also, I don't have the time right now to go forward and backward on each one of them. So I felt like rushing it – this time.
Right now I am still testing myself. Also, after delegating to the PAGE validator in core, I found another bug there... |
This attempts to fix problems caused by invalid polygons from
ocrd-segment-repair
(both insanitize
andplausibilize
mode).This taught me another lesson about what can go wrong with Shapely / numpy / PAGE interaction. To sum up:
simplify
with ever increasing tolerance until valid. EDIT2 The problem is that the result of the algorithm implemented in Shapely/GEOS depends on the starting point it picked. In pathological cases, no simplification whatsoever can be achieved. (The only thing that then helps is re-ordering...)union
orintersection
can create collections of shapes. EDIT There are actually 2 cases here:union
orintersection
can create non-integer points, which when rounded for PAGE serialization can become invalid paths. Unfortunately, Shapely always calculates in floating point internally. So all we can do is rounding and then ensuring validity (as in 1).Related: