Releases: Crivella/ocr_translate
Release v0.6.0
The main change is the introduction of a plugin manager to install the plugins+dependencies on demand.
This makes the release versions (both windows EXE and docker image) much smaller, and allow users to decide which functionalities they want to use.
IMPORTANT
From version 0.6 onward python and pip need to be installed on the system (3.10 or 3.11).
See more below in the Changes section.
-
Windows: https://www.python.org/downloads/windows/
- NOTE: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
-
Linux: Use your package manager (e.g. sudo apt install python3 python3-pip)
While i think for the most part everything should work fine, i assume there might be some edge cases that I've not considered,
that might make the way I've handled the plugin manager not work for everyone.
Feel free to report such issues and I'll try to fix them as soon as possible.
(If to many problems arise I might consider redesigning the plugin manager itself)
Changes
- Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:- Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
- Create or reuse a virtual environment in a folder venv in the same directory as the script
- Install the minimum required packages in it to run the server
- Run the server
- Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.
- Added a plugin manager to install/uninstall plugins on demand
-
The installed plugins can be controlled via the new version of the firefox extension or directly using the
manage_plugins/ endpoint. -
The plugins will by be installed under
$OCT_BASE_DIR/plugins
which by default will be under your user profile (e.g.C:\Users\<USERNAME>\.ocr_translate
on windows).
If you have trouble with space under C:\ consider setting the OCT_BASE_DIR environment variable to a different location. -
The plugin data is stored in a JSON file inside the project plugins_data.json
-
Version/Scope/Extras of a package to be installed can be controlled via environment variables
OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS]
(eg to change torch to version A.B.C you would set OCT_PKG_TORCH_VERSION="A.B.C").
If the package name contains a-
it should be replaced with_min_
in the package name -
Removed env variable AUTOCREATE_VALIDATED_MODELS and relative server initialization.
Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.
-
- Streamlined docker image to also use the run_server.py script for initialization.
- Added plugin for ollama (https://github.com/ollama/ollama) for translation using LLMs
- Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
- Use the OCT_OLLAMA_ENDPOINT environment variable to specify the endpoint of the ollama server
(see the plugin page for more details)
- Added plugin for PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well
with chinese).- The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
might not work for every system as there can be underlying failures in the C++ code that the plugin uses.
The version installed can be controlled using the environment variable OCT_PKG_PADDLEPADDLE_VERSION.
- The default versions installed by the plugin_manager of paddlepaddle (2.5.2 on linux and 2.6.1 on windows)
- Added possibility to specify extra DJANGO_ALLOWED_HOSTS and a server bind address via environment variables. (Fixes #30)
- Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
- OCR models can now use a tokenizer and a processor from different models.
- Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
- New endpoint run_tsl_xua made to work with XUnity.AutoTranslator (https://github.com/bbepis/XUnity.AutoTranslator)
- Improved API return codes
Migrating from an older version
As usual, the database will be upgraded automatically to the new version.
For safety, it is suggested to make a copy of it (by default under %USERPROFILE%/.ocr_translate
) in case you need to downgrade.
Already downloaded model can be reused, but the new structure is slightly different, before you would have something like:
- %USERPROFILE%/.ocr_translate/
- <huggingface_models>
- .easyocr/
- <easyocr_models>
- tesseract/
- <tesseract_models>
Now by default you will have:
- %USERPROFILE%/.ocr_translate/ (or whatever
OCT_BASE_DIR
is set to)- models/
- huggingface/ (or whatever
TRANSFORMERS_CACHE
is set to)- <huggingface_models>
- easyocr/ (or whatever
EASYOCR_PREFIX
is set to)- <easyocr_models>
- tesseract/ (or whatever
TESSEARCT_PREFIX
is set to)- <tesseract_models>
- paddleocr/ (or whatever
PADDLEOCR_PREFIX
is set to)- <paddleocr_models>
- huggingface/ (or whatever
- models/
You can move them manually to mimic the new structure or delete the them and let the server re-download them.
Plugins will be stored under OCT_BASE_DIR/plugins
(default to %USERPROFILE%/.ocr_translate
)
OCT_BASE_DIR/
(default to%USERPROFILE%/.ocr_translate
)- plugins.json (list of installed plugins)
- plugins/
- <plugin_data>/
The installed python packages divided by scope depending if they are ment to be used for CPU/GPU/BOTH
- <plugin_data>/
This folder can go up to several GB when installing torch (huggingface and easyocr) for GPU, so make sure you have enough space.
Fixes
v0.5.1
What's Changed
Now it is possible to upload manual translation by editing the textboxes (requires extension >=0.2.2). Also the extension can now actually control the advanced option if you like to tinker.
The admin interface of the server has been improved and a superuser is automatically created to access it in order to add other models if you want without having to edit plugins or source files.
- Implemented endpoint for manual translation
- Added autocorrect capability to Trie
- Added endpoint for sending allowed options given the loaded models
- Improved admin interface to allow users to more easily add models to the database
- Changed handshake endpoint behavior to send more information required by the extension
- Improved run_server script for better modularity and reporting
- Minor fixes
v0.4.0
What's Changed
Now it is possible to use OCR models that work on a single line.
Before the pipeline would pass the entire BOX to the OCR model which would make model trained on single line spit out nonsensical results.
Now model can be created with ocr_mode
set to merged
[default] or single
.
If set to single the non-merged bounding boxes will be passed and the model.
The text results will afterward be stiched together by reasonably ordering the Boxes by line/column chunks.
- Modified the API for the
OCRBoxModel._box_detection
should now return a list of dictionaries containing'merged: tuple[int, int, int, int]
the merged bounding box and'single': list[tuple[int, int, int, int]]
a list of single bounding boxes that has been merged intomerged
. - Modified the database models:
OCRModel
: Addedocr_mode
field with possible values:merged
[default]single
.BBox
: Foreign keyfrom_ocr
renamed tofrom_ocr_merged
BBox
: Added foreign keyfrom_ocr_single
BBox
: Added foreign keyto_merged
(point to the mergedBBox
generated by merging THIS + other boxes)OCRRun
: Foreign keyresult
renamed toresult_merged
(denote the output was from a merged real/mock run)OCRRun
: Added foreign keyresult_single
(denote the output was from a single run)
- Fixed a bug related to Issue #11 where the
%userprofile%/.ocr_translate
folder was not being properly created by the EXE release if it did not exists.
v0.3.2
IMPORTANT:
Due to this bug the run_server.py is not creating the %userprofile%/.ocr_translate
folder automatically.
If this is your first install of the server please manually create the %userprofile%/.ocr_translate
folder (you can type %userprofile%
in file explorer and create a folder named .ocr_translate
).
This will be fixed in the next release of the code
What's Changed
- All feature for box/ocr/tsl have been moved to plugins in separate packages (They are still bundled together with the EXE release)
- Improved pre-parsing of OCRed text before translation for languages with latin alphabet
- Introduced a way to remove ghost character generated at the begin/end of every string (e.g. tesseract would produce random character at the begin/end of many string, probably due to speech bubble edges included in the box).
- Introduced Trie capability (only for languages with a list/freq file ... for now only English)
- Can use trie to detect if an incorrect work ("helloworld") should be split into multiple valid words (["hello", "world"])
- Added English word list/freq file.
- From the plugin for easyocr, the boxes now are merged with higher tolerances, to reduce occurrence of multiple boxes in a single speech bubble (It would cause translation to be much worse since boxes are translated 1by1)
NOTE: There is also a plugin to run translation with google translate. It is not included in the EXE release as it does not fit with the main idea of the tool being something that will run entirely on system, but if there is request for it I can include it in successive release bundles (anyway you would need to select it to use it).
v0.2.0
What's Changed
-
Restructured the code to make it pluginable.
-
No change should be noticeable from a user experience point of view, but now it should be much easier to contribute to the code (new functionalities can be introduced by writing a plugin without having to modify this codebase).
-
The models entries in the database now requires an
entrypoint
field to identify which model should be used to load it. -
The functionality related to
easyocr
,tesseract
andhugginface
models have been moved to theocr_translate/plugins
folder, and are now plugins (kept in the main codebase to leave an example on how a plugin can work).
v0.1.4
What's Changed
- Added default_option for languages and models (priority
lang
<model
<func_parameter
). - Added option restore_dash_newlines to the pre-tokenizer to handle long words broken to a new line with a dash
- Pyinstaller now is being used with the onedir option instead of onefile (To avoid leaving uncleaned temp files and faster startup)
- The packaged server now run in DEBUG (django) mode by default with log level INFO to allow users to see when the server is working and what it is doing.