Skip to content

Python module to interface with Java Loomchild sentence segmenter

License

Notifications You must be signed in to change notification settings

bitextor/loomchild-segment-py

Repository files navigation

loomchild-segment

A python module for interfacing with Java sentence splitter Loomchild. This package is aimed to be used in Bifixer and/or Bitextor

System dependencies to build and use this package are Maven and Java.

Installation

This package can be installed with pip from pypi:

pip install loomchild-segment

Usage

Splitting a text into sentences:

from loomchild.segmenter import LoomchildSegmenter

segmenter = LoomchildSegmenter(lang)
# segmenting a single line:
segments = segmenter.get_segmentation(input_line)
print("\n".join(segments))

# segmenting a document (i.e. multiple line breaks in the input)
segments = segmenter.get_document_segmentation(input_text)
print("\n".join(segments))

A command line tool is provided to work with base64 encoded documents.

cat b64encoded_input | py-segment -l $LANG > b64encoded_output

About

Python module to interface with Java Loomchild sentence segmenter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages