Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kylel/2022 09/span group utils #135

Merged
merged 3 commits into from
Sep 7, 2022
Merged

Conversation

kyleclo
Copy link
Collaborator

@kyleclo kyleclo commented Sep 7, 2022

minor PR: added tests for a pretty important functionality that was being used -- how to combine Spans that are next to each other into a single big Span. the key thing that was undocumented was that Boxes for the underlying Spans actually disappear after this merging functionality, which the tests now capture.

dont think this is intended behavior we want to support in future, but for now, this is how this utility is being used

Copy link
Contributor

@cmwilhelm cmwilhelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me as a way of documenting existing behavior and catching behavioral regressions if/when changes are made to mmda in the future.

Question though -- this code path gets invoked when one does document.annotate(fieldname=[some, boxgroups]), right? So you're saying that in the automatic translation to spangroups, we will outright lose bbox information despite that being the form of the information originally?

If so, that strikes me likely as being a bug rather than something we're actively depending on right now. I suppose we can at least recover the bboxes doing something like document.fieldname.tokens.boxes?

@geli-gel
Copy link
Contributor

geli-gel commented Sep 7, 2022

@cmwilhelm It looks like we don't lose original bbox information because it's still there in the SpanGroup's BoxGroup

@kyleclo
Copy link
Collaborator Author

kyleclo commented Sep 7, 2022

@geli-gel 's right. the only place this is lost is in the Span's Box, not in the SpanGroup's BoxGroup.

i dont think this is a bug; it's just poor/opaque design. ive created an issue to visit this later: #137

@kyleclo kyleclo merged commit 0987b60 into main Sep 7, 2022
@kyleclo kyleclo deleted the kylel/2022-09/span-group-utils branch September 7, 2022 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants