You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an error case not to forget that causes some trouble with the sentence segmentation.
The document is not CC-BY, referenced here: https://dx.doi.org/10.1063/1.1874292
Here the delinquent paragraph:
With version 0.8.0 and the current master, the process fails:
ERROR [2024-06-11 06:22:00,602] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.
! java.lang.StringIndexOutOfBoundsException: begin 592, end 595, length 594
! at java.base/java.lang.String.checkBoundsBeginEnd(String.java:4606)
! at java.base/java.lang.String.substring(String.java:2709)
! at org.grobid.core.document.TEIFormatter.segmentIntoSentences(TEIFormatter.java:1900)
! at org.grobid.core.document.TEIFormatter.toTEITextPiece(TEIFormatter.java:1468)
! at org.grobid.core.document.TEIFormatter.toTEIBody(TEIFormatter.java:1015)
! at org.grobid.core.engines.FullTextParser.toTEI(FullTextParser.java:2648)
! ... 83 common frames omitted
! Causing: org.grobid.core.exceptions.GrobidException: [GENERAL] An exception occurred while running Grobid.
if (pos+posInSentence <= theSentences.get(i).end) {
):
String local_text_chunk = text.substring(pos+posInSentence, theSentences.get(i).end); may crash when the sentence is going over the text length
The if is completely ignored in certain cases, so all the accumulated nodes are dropped. See below:
<div
xmlns="http://www.tei-c.org/ns/1.0">
<head>C. dc field dependence of R "T , B rf , B dc , f…</head>
<p>
<s>As mentioned in Ref.</s>
<s>31, properly annealed, bulk Nb TM-TE-mode cavities show large additional rf losses by frozen-in flux with, e.g., at 4.2 K and 2 GHz, R H Ӎ 2 ⍀ H dc / mT for RRRӍ 30, which is described in Eq. ͑3.9͒ by  Ӎ 1 and  Ͻ 10 for RRRտ 200.</s>
<s>Those large rf losses by the normal conducting cores of slow AF do not increase with rf field level.</s>
<s>,
<ref type="bibr" target="#b30">31</ref>
</s>
</p>
</div>
```
The text was updated successfully, but these errors were encountered:
This is an error case not to forget that causes some trouble with the sentence segmentation.
The document is not CC-BY, referenced here: https://dx.doi.org/10.1063/1.1874292
Here the
delinquent
paragraph:With version 0.8.0 and the current master, the process fails:
There are two problems (code
grobid/grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java
Line 2028 in 694f0ed
String local_text_chunk = text.substring(pos+posInSentence, theSentences.get(i).end);
may crash when the sentence is going over the text lengthThe text was updated successfully, but these errors were encountered: