-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
naive string matching model #666
Comments
Back to the #666 Everything is available now. Under The sequence would be as follow:
When annotations are added, we will have sentence-level segments with "ref-spans" and the added "entity-spans" for the software annotations. About the corresponding added data introduced in #665:
|
We talked about creating a model using naive string matching. Primary use is to identify areas likely to be "mention-rich," given our finding from the manual annotation that mentions tend to cluster together in papers. Expectation is that using "go" lists of known software, adjusted for well-known ambiguous phrases, can find those mention rich chunks for further annotation. Expect decent recall, but very low precision!
To that end commit a02b847 moved the software_lists I was playing around with into
data/software_lists/
@kermitt2 is going to use those to implement a matching model, resulting in json files with entity_spans with resp="naive_string_match" or something similar.Might be interesting to compare that effort against our gold standard annotations (after removing the specific strings for software names from that set), and against the trained model.
The text was updated successfully, but these errors were encountered: