createTRAINING batch command #1149

ap-mps · 2024-08-02T14:46:25Z

when running this command I noticed that corresponding to a certain PDF present in the 'directory of input files' files for the header model are not generated ?

Why so and generally is there a criteria for generation of output files model wise corresponding to an input pdf?

kermitt2 · 2024-08-02T16:14:56Z

Hello !

Normally it means that the PDF is image only (Grobid does not include an OCR, it has to be provided as pre-processing). Other possible explanations: encrypted PDF or corrupted PDF. Finally it's also possible that no header is detected by the segmentation model which is applied first. In the last case, it means the corrected segmentation training file has to be put first in the segmentation training and the segmentation model updated.

lfoppiano added the question There's no such thing as a stupid question label Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

createTRAINING batch command #1149

createTRAINING batch command #1149

ap-mps commented Aug 2, 2024

kermitt2 commented Aug 2, 2024

createTRAINING batch command #1149

createTRAINING batch command #1149

Comments

ap-mps commented Aug 2, 2024

kermitt2 commented Aug 2, 2024