Splitting up TEI XML files for TagWorks #580

jameshowison · 2019-01-09T18:03:52Z

We're exploring using TagWorks for scaling up tagging, they have a sensible user interface and handle recruitment via crowdsourcing etc. Their interface allows the user to reveal additional context before and after the sentence (as well as ask about certainty and highlight parts of the sentence, such as software name, version, etc).

I'm looking at the TEI XML output from grobid, very cool stuff, I love the biblio recognition! My thinking is to have sentences from the <body> as codeable units. Any thoughts on how to break up the <body>?

The text was updated successfully, but these errors were encountered:

kermitt2 · 2019-01-09T19:23:00Z

For the machine learning sequence labelling, I am using the vague notion of "paragraph" as input (it's not sentence-based, I got better results extending to a complete paragraph), and more concretely the following TEI sections:

title: <title level="a">
abstract <abstract>
keywords <keywords>
paragraph <p>
item <item> (if any, but normally always under <p> when generated by grobid)
figure/table caption <figDesc>

I also process the content of the annex (if any) which is not under the <body> in TEI, but under <back>

kermitt2 · 2019-01-09T19:26:58Z

I don't know tagWorks actually, but it looks promising!

I have a small list of such tools, so I share it here for reference:

https://aws.amazon.com/sagemaker/groundtruth/?nc1=h_ls (allows to benefit from Amazon Mechanical Turk to recruit)
https://www.tagtog.net/
https://github.com/varal7/ieturk (simple UI for Amazon Mechanical Turk)

jameshowison mentioned this issue Jan 9, 2019

Inter-Annotator Agreement is low #538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting up TEI XML files for TagWorks #580

Splitting up TEI XML files for TagWorks #580

jameshowison commented Jan 9, 2019

kermitt2 commented Jan 9, 2019

kermitt2 commented Jan 9, 2019

Splitting up TEI XML files for TagWorks #580

Splitting up TEI XML files for TagWorks #580

Comments

jameshowison commented Jan 9, 2019

kermitt2 commented Jan 9, 2019

kermitt2 commented Jan 9, 2019