Skip to content

Releases: Crivella/ocr_translate

Release v0.6.0

19 Aug 22:11
4170bed
Compare
Choose a tag to compare

The main change is the introduction of a plugin manager to install the plugins+dependencies on demand.
This makes the release versions (both windows EXE and docker image) much smaller, and allow users to decide which functionalities they want to use.

IMPORTANT

From version 0.6 onward python and pip need to be installed on the system (3.10 or 3.11).
See more below in the Changes section.

  • Windows: https://www.python.org/downloads/windows/

    • NOTE: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
  • Linux: Use your package manager (e.g. sudo apt install python3 python3-pip)

While i think for the most part everything should work fine, i assume there might be some edge cases that I've not considered,
that might make the way I've handled the plugin manager not work for everyone.
Feel free to report such issues and I'll try to fix them as soon as possible.
(If to many problems arise I might consider redesigning the plugin manager itself)

Changes

  • Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
    • Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
      For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:
      • Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
      • Create or reuse a virtual environment in a folder venv in the same directory as the script
      • Install the minimum required packages in it to run the server
      • Run the server
  • Added a plugin manager to install/uninstall plugins on demand
    • The installed plugins can be controlled via the new version of the firefox extension or directly using the
      manage_plugins/ endpoint.

    • The plugins will by be installed under $OCT_BASE_DIR/plugins which by default will be under your user profile (e.g. C:\Users\<USERNAME>\.ocr_translate on windows).
      If you have trouble with space under C:\ consider setting the OCT_BASE_DIR environment variable to a different location.

    • The plugin data is stored in a JSON file inside the project plugins_data.json

    • Version/Scope/Extras of a package to be installed can be controlled via environment variables

      OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS]
      

      (eg to change torch to version A.B.C you would set OCT_PKG_TORCH_VERSION="A.B.C").
      If the package name contains a - it should be replaced with _min_ in the package name

    • Removed env variable AUTOCREATE_VALIDATED_MODELS and relative server initialization.
      Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.

  • Streamlined docker image to also use the run_server.py script for initialization.
  • Added plugin for ollama (https://github.com/ollama/ollama) for translation using LLMs
    • Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
    • Use the OCT_OLLAMA_ENDPOINT environment variable to specify the endpoint of the ollama server
      (see the plugin page for more details)
  • Added plugin for PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well
    with chinese).
    • The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
      might not work for every system as there can be underlying failures in the C++ code that the plugin uses.
      The version installed can be controlled using the environment variable OCT_PKG_PADDLEPADDLE_VERSION.
  • Added possibility to specify extra DJANGO_ALLOWED_HOSTS and a server bind address via environment variables. (Fixes #30)
  • Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
  • OCR models can now use a tokenizer and a processor from different models.
  • Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
  • New endpoint run_tsl_xua made to work with XUnity.AutoTranslator (https://github.com/bbepis/XUnity.AutoTranslator)
  • Improved API return codes

Migrating from an older version

As usual, the database will be upgraded automatically to the new version.
For safety, it is suggested to make a copy of it (by default under %USERPROFILE%/.ocr_translate) in case you need to downgrade.

Already downloaded model can be reused, but the new structure is slightly different, before you would have something like:

  • %USERPROFILE%/.ocr_translate/
    • <huggingface_models>
    • .easyocr/
      • <easyocr_models>
    • tesseract/
      • <tesseract_models>

Now by default you will have:

  • %USERPROFILE%/.ocr_translate/ (or whatever OCT_BASE_DIR is set to)
    • models/
      • huggingface/ (or whatever TRANSFORMERS_CACHE is set to)
        • <huggingface_models>
      • easyocr/ (or whatever EASYOCR_PREFIX is set to)
        • <easyocr_models>
      • tesseract/ (or whatever TESSEARCT_PREFIX is set to)
        • <tesseract_models>
      • paddleocr/ (or whatever PADDLEOCR_PREFIX is set to)
        • <paddleocr_models>

You can move them manually to mimic the new structure or delete the them and let the server re-download them.

Plugins will be stored under OCT_BASE_DIR/plugins (default to %USERPROFILE%/.ocr_translate)

  • OCT_BASE_DIR/ (default to %USERPROFILE%/.ocr_translate)
    • plugins.json (list of installed plugins)
    • plugins/
      • <plugin_data>/
        The installed python packages divided by scope depending if they are ment to be used for CPU/GPU/BOTH

This folder can go up to several GB when installing torch (huggingface and easyocr) for GPU, so make sure you have enough space.

Fixes

v0.5.1

17 Dec 07:45
25360e3
Compare
Choose a tag to compare

What's Changed

Now it is possible to upload manual translation by editing the textboxes (requires extension >=0.2.2). Also the extension can now actually control the advanced option if you like to tinker.
The admin interface of the server has been improved and a superuser is automatically created to access it in order to add other models if you want without having to edit plugins or source files.

  • Implemented endpoint for manual translation
  • Added autocorrect capability to Trie
  • Added endpoint for sending allowed options given the loaded models
  • Improved admin interface to allow users to more easily add models to the database
  • Changed handshake endpoint behavior to send more information required by the extension
  • Improved run_server script for better modularity and reporting
  • Minor fixes

v0.4.0

29 Oct 00:58
b3f8ae4
Compare
Choose a tag to compare

What's Changed

Now it is possible to use OCR models that work on a single line.
Before the pipeline would pass the entire BOX to the OCR model which would make model trained on single line spit out nonsensical results.
Now model can be created with ocr_mode set to merged[default] or single.
If set to single the non-merged bounding boxes will be passed and the model.
The text results will afterward be stiched together by reasonably ordering the Boxes by line/column chunks.

  • Modified the API for the OCRBoxModel._box_detection should now return a list of dictionaries containing 'merged: tuple[int, int, int, int] the merged bounding box and 'single': list[tuple[int, int, int, int]] a list of single bounding boxes that has been merged into merged.
  • Modified the database models:
    • OCRModel: Added ocr_mode field with possible values: merged[default] single.
    • BBox: Foreign key from_ocr renamed to from_ocr_merged
    • BBox: Added foreign key from_ocr_single
    • BBox: Added foreign key to_merged (point to the merged BBox generated by merging THIS + other boxes)
    • OCRRun: Foreign key result renamed to result_merged (denote the output was from a merged real/mock run)
    • OCRRun: Added foreign key result_single (denote the output was from a single run)
  • Fixed a bug related to Issue #11 where the %userprofile%/.ocr_translate folder was not being properly created by the EXE release if it did not exists.

v0.3.2

09 Oct 04:33
06f9bd9
Compare
Choose a tag to compare

IMPORTANT:

Due to this bug the run_server.py is not creating the %userprofile%/.ocr_translate folder automatically.
If this is your first install of the server please manually create the %userprofile%/.ocr_translate folder (you can type %userprofile% in file explorer and create a folder named .ocr_translate).
This will be fixed in the next release of the code

What's Changed

  • All feature for box/ocr/tsl have been moved to plugins in separate packages (They are still bundled together with the EXE release)
  • Improved pre-parsing of OCRed text before translation for languages with latin alphabet
    • Introduced a way to remove ghost character generated at the begin/end of every string (e.g. tesseract would produce random character at the begin/end of many string, probably due to speech bubble edges included in the box).
    • Introduced Trie capability (only for languages with a list/freq file ... for now only English)
      • Can use trie to detect if an incorrect work ("helloworld") should be split into multiple valid words (["hello", "world"])
    • Added English word list/freq file.
  • From the plugin for easyocr, the boxes now are merged with higher tolerances, to reduce occurrence of multiple boxes in a single speech bubble (It would cause translation to be much worse since boxes are translated 1by1)

NOTE: There is also a plugin to run translation with google translate. It is not included in the EXE release as it does not fit with the main idea of the tool being something that will run entirely on system, but if there is request for it I can include it in successive release bundles (anyway you would need to select it to use it).

v0.2.0

17 Sep 05:56
b28a143
Compare
Choose a tag to compare

What's Changed

  • Restructured the code to make it pluginable.

  • No change should be noticeable from a user experience point of view, but now it should be much easier to contribute to the code (new functionalities can be introduced by writing a plugin without having to modify this codebase).

  • The models entries in the database now requires an entrypoint field to identify which model should be used to load it.

  • The functionality related to easyocr, tesseract and hugginface models have been moved to the ocr_translate/plugins folder, and are now plugins (kept in the main codebase to leave an example on how a plugin can work).

v0.1.4

30 Jul 20:44
af254c5
Compare
Choose a tag to compare

What's Changed

  • Added default_option for languages and models (priority lang < model <func_parameter).
  • Added option restore_dash_newlines to the pre-tokenizer to handle long words broken to a new line with a dash
  • Pyinstaller now is being used with the onedir option instead of onefile (To avoid leaving uncleaned temp files and faster startup)
  • The packaged server now run in DEBUG (django) mode by default with log level INFO to allow users to see when the server is working and what it is doing.

v0.1.3

27 Jul 22:09
317dac1
Compare
Choose a tag to compare

v0.1.3

  • Added extensive test suite
  • Added EXE releases for windows