-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting up TEI XML files for TagWorks #580
Comments
For the machine learning sequence labelling, I am using the vague notion of "paragraph" as input (it's not sentence-based, I got better results extending to a complete paragraph), and more concretely the following TEI sections:
I also process the content of the annex (if any) which is not under the |
I don't know tagWorks actually, but it looks promising! I have a small list of such tools, so I share it here for reference:
|
We're exploring using TagWorks for scaling up tagging, they have a sensible user interface and handle recruitment via crowdsourcing etc. Their interface allows the user to reveal additional context before and after the sentence (as well as ask about certainty and highlight parts of the sentence, such as software name, version, etc).
I'm looking at the TEI XML output from grobid, very cool stuff, I love the biblio recognition! My thinking is to have sentences from the
<body>
as codeable units. Any thoughts on how to break up the<body>
?The text was updated successfully, but these errors were encountered: