Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include symbols btwn disjoints #277

Merged
merged 2 commits into from
Sep 4, 2023
Merged

Conversation

geli-gel
Copy link
Contributor

Implements a solution for stringify.py's optional include_symbols_between_disjoint_spans that finds matched_words based on start and end span of the spangroup rather than direcly overlapping words with the spans. This will help with https://github.com/allenai/scholar/issues/36976 where we need to find mention text that may have disjoint spans within body text that includes those missing "in-between" characters.

example results:

stringified citation_mention spangroup:
Takase et al. 2018
stringified including in-between symbols citation_mention spangroup:
Takase et al., 2018

its sentence: 
stringified sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified including in-between symbols sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified citation_mention spangroup:
Yang et al. 2018
stringified including in-between symbols citation_mention spangroup:
Yang et al., 2018

its sentence: 
stringified sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified including in-between symbols sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified citation_mention spangroup:
Marcus et al. 1993
stringified including in-between symbols citation_mention spangroup:
Marcus et al., 1993

its sentence: 
stringified sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified including in-between symbols sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified citation_mention spangroup:
Merity et al. 2017
stringified including in-between symbols citation_mention spangroup:
Merity et al., 2017

its sentence: 
stringified sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified including in-between symbols sentence spangroup:
We evaluate our method on the current state of the art model, DOC (Takase et al., 2018), and the previous state of the art model, MoS (Yang et al., 2018), on the Penn Treebank (Marcus et al., 1993) and WikiText-2 (Merity et al., 2017) language modeling datasets.
stringified citation_mention spangroup:
Merity et al. 2018
stringified including in-between symbols citation_mention spangroup:
Merity et al., 2018

its sentence: 
stringified sentence spangroup:
In addition, we present results for finetuned (Merity et al., 2018) models, with and without the Partial Shuffle.
stringified including in-between symbols sentence spangroup:
In addition, we present results for finetuned (Merity et al., 2018) models, with and without the Partial Shuffle.

Copy link
Member

@soldni soldni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@soldni soldni merged commit 47d16c5 into main Sep 4, 2023
5 checks passed
@soldni soldni deleted the include_symbols_btwn_disjoints branch September 4, 2023 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants