Feature 0.6 (#33)

# Release v0.6.0 The main change is the introduction of a plugin manager to install the plugins+dependencies on demand. This makes the release versions (both windows EXE and docker image) much smaller, and allow users to decide which functionalities they want to use. ## IMPORTANT From version 0.6 onward `python` and `pip` need to be installed on the system. See more below in the [Changes](#changes) section. - Windows: https://www.python.org/downloads/windows/ - **NOTE**: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions - Linux: Use your package manager (e.g. `sudo apt install python3 python3-pip`) ## Changes - Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script - Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.\ For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will: - Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script) - Create or reuse a virtual environment in a folder `venv` in the same directory as the script - Install the minimum required packages in it to run the server - Run the server - Added a plugin manager to install/uninstall plugins on demand - The installed plugins can be controlled via the new version of the firefox extension or directly using the `manage_plugins/` endpoint. - The plugins will by be installed under `$OCT_BASE_DIR/plugins` which by default will be under your user profile (e.g. `C:\Users\username\.ocr_translate` on windows). \ If you have trouble with space under `C:\` consider setting the `OCT_BASE_DIR` environment variable to a different location. - The plugin data is stored in a JSON file inside the project [plugins_data.json](blob/v0.6.0/ocr_translate/plugins_data.json) - Version/Scope/Extras of a package to be installed can be controlled via environment variables OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS] (eg to change torch to version A.B.C you would set `OCT_PKG_TORCH_VERSION="A.B.C"`). If the package name contains a `-` it should be replaced with `_min_` in the package name - Removed env variable `AUTOCREATE_VALIDATED_MODELS` and relative server initialization. Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled. - Streamlined docker image to also use the `run_server.py` script for initialization. - Added plugin for `ollama` (https://github.com/ollama/ollama) for translation using LLMs - Note ollama needs to be run/installed separately and the plugin will just make calls to the server. - Use the `OCT_OLLAMA_ENDPOINT` environment variable to specify the endpoint of the ollama server ([see the plugin page for more details](https://github.com/Crivella/ocr_translate-ollama)) - Added plugin for `PaddleOCR` (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well with chinese). - The default versions installed by the `plugin_manager` of `paddlepaddle` (`2.5.2` on linux and `2.6.1` on windows) might not work for every system as there can be underlying failures in the C++ code that the plugin uses. The version installed can be controlled using the environment variable `OCT_PKG_PADDLEPADDLE_VERSION`. - Added possibility to specify extra `DJANGO_ALLOWED_HOSTS` and a server bind address via environment variables. (Fixes #30) - Manual model is not implemented as an entrypoint anymore (will work also without recreating models). - OCR models can now use a `tokenizer` and a `processor` from different models. - Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint. - New endpoint `run_tsl_xua` made to work with `XUnity.AutoTranslator` (https://github.com/bbepis/XUnity.AutoTranslator) - Improved API return codes ## Fixes - FIx #26 - Fix #30
Crivella · Aug 19, 2024 · 4170bed · 4170bed
1 parent faf9c27
commit 4170bed
Show file tree

Hide file tree

Showing 76 changed files with 8,296 additions and 3,911 deletions.
diff --git a/.github/workflows/ci-docs.yml b/.github/workflows/ci-docs.yml
@@ -46,7 +46,7 @@ jobs:
  id: linkcheck
  run: |
  make -C docs html linkcheck 2>&1 | tee check.log
- echo "broken=$(grep '(line\s*[0-9]*)\(\s\)broken\(\s\)' check.log)" >> $GITHUB_OUTPUT
+ echo "broken=$(grep -E 'line\s+[0-9]+)\s+broken\s+' check.log)" >> $GITHUB_OUTPUT
  env:
  SPHINXOPTS: -nW --keep-going
 

diff --git a/.gitignore b/.gitignore
@@ -3,7 +3,7 @@
 *.pot
 *.pyc
 __pycache__
-*.sqlite3
+*.sqlite3*
 media
 
 # Backup files #

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,58 @@
 
 List of changes between versions
 
+## 0.6.0
+
+### IMPORTANT
+
+From version 0.6 onward `python` and `pip` need to be installed on the system.
+See more below in the [Changes](#changes) section.
+
+- Windows: https://www.python.org/downloads/windows/
+ - **NOTE**: make sure to check the box that says "Add Python to PATH" so that pip can be found by the server script without having to make any assumptions
+
+- Linux: Use your package manager (e.g. `sudo apt install python3 python3-pip`)
+
+### Changes
+
+- Removed the frozen executable from the release files in favor of an Automatic1111 stile batch script
+ - Even with the plugin manager, installing some dependencies that requiers actual compilation by invoking pip from within the frozen executable was giving non trivial to fix trouble.\
+ For this reason I decided to axe the PyInstaller frozen EXE all together and go with a batch script that will:
+ - Allow user to more easily set environment variables (a few of the most relevant ones are already set as empty in the script)
+ - Create or reuse a virtual environment in a folder `venv` in the same directory as the script
+ - Install the minimum required packages in it to run the server
+ - Run the server
+- Added a plugin manager to install/uninstall plugins on demand
+ - The installed plugins can be controlled via the new version of the firefox extension or directly using the
+ `manage_plugins/` endpoint.
+ - The plugins will by be installed under `$OCT_BASE_DIR/plugins` which by default will be under your user profile (e.g. `C:\Users\username\.ocr_translate` on windows). \
+ If you have trouble with space under `C:\` consider setting the `OCT_BASE_DIR` environment variable to a different location.
+ - The plugin data is stored in a JSON file inside the project [plugins_data.json](blob/v0.6.0/ocr_translate/plugins_data.json)
+ - Version/Scope/Extras of a package to be installed can be controlled via environment variables
+
+ OCT_PKG_<package_name(uppercase)>_[VERSION|SCOPE|EXTRAS]
+
+ (eg to change torch to version A.B.C you would set `OCT_PKG_TORCH_VERSION="A.B.C"`).
+ If the package name contains a `-` it should be replaced with `_min_` in the package name
+ - Removed env variable `AUTOCREATE_VALIDATED_MODELS` and relative server initialization.
+ Now models are created/activated or deactivated via the plugin manager, when the respective plugin is installed/uninstalled.
+- Streamlined docker image to also use the `run_server.py` script for initialization.
+- Added plugin for `ollama` (https://github.com/ollama/ollama) for translation using LLMs
+ - Note ollama needs to be run/installed separately and the plugin will just make calls to the server.
+ - Use the `OCT_OLLAMA_ENDPOINT` environment variable to specify the endpoint of the ollama server
+ ([see the plugin page for more details](https://github.com/Crivella/ocr_translate-ollama))
+- Added plugin for `PaddleOCR` (https://github.com/PaddlePaddle/PaddleOCR) (Box and OCR) (seems to work very well
+ with chinese).
+ - The default versions installed by the `plugin_manager` of `paddlepaddle` (`2.5.2` on linux and `2.6.1` on windows)
+ might not work for every system as there can be underlying failures in the C++ code that the plugin uses.
+ The version installed can be controlled using the environment variable `OCT_PKG_PADDLEPADDLE_VERSION`.
+- Added possibility to specify extra `DJANGO_ALLOWED_HOSTS` and a server bind address via environment variables. (Fixes #30)
+- Manual model is not implemented as an entrypoint anymore (will work also without recreating models).
+- OCR models can now use a `tokenizer` and a `processor` from different models.
+- Added caching of the languages and allowed box/ocr/tsl models for faster response times on the handshake endpoint.
+- New endpoint `run_tsl_xua` made to work with `XUnity.AutoTranslator` (https://github.com/bbepis/XUnity.AutoTranslator)
+- Improved API return codes
+
 ## 0.5.1
 
 - Implemented endpoint for manual translation

diff --git a/Dockerfile-cpu → Dockerfile b/Dockerfile-cpu → Dockerfile
@@ -5,17 +5,14 @@ RUN virtualenv /venv/
 
 RUN mkdir -p /src
 
-COPY requirements-torch-cpu.txt /src/
-COPY requirements.txt /src/
-COPY plugins.txt /src/
-# This might have to be removed when pushing the image?
-COPY .pip_cache-cpu /pip_cache
+COPY ocr_translate /src/ocr_translate
+COPY pyproject.toml /src/
+COPY README.md /src/
 
 RUN mkdir -p /pip_cache
-RUN /venv/bin/pip install -r /src/requirements-torch-cpu.txt --cache-dir /pip_cache
-RUN /venv/bin/pip install -r /src/requirements.txt --cache-dir /pip_cache
-RUN /venv/bin/pip install -r /src/plugins.txt --cache-dir /pip_cache
-RUN /venv/bin/pip install gunicorn --cache-dir /pip_cache
+RUN --mount=type=cache,target=/pip_cache /venv/bin/pip install --cache-dir /pip_cache /src/
+# RUN --mount=type=cache,target=/pip_cache /venv/bin/pip install --cache-dir /pip_cache django-ocr_translate
+RUN --mount=type=cache,target=/pip_cache /venv/bin/pip install gunicorn --cache-dir /pip_cache
 
 FROM python:3.10.12-slim-bookworm
 
@@ -37,8 +34,8 @@ RUN mkdir -p /opt/app/static
 RUN mkdir -p /opt/app/media
 
 COPY start-server.sh /opt/app/
-COPY manage.py /opt/app/
-COPY ocr_translate /opt/app/ocr_translate/
+COPY run_server.py /opt/app/
+# COPY ocr_translate /opt/app/ocr_translate/
 COPY mysite /opt/app/mysite/
 COPY staticfiles /opt/app/static/
 COPY media /opt/app/media/
@@ -56,11 +53,8 @@ ENV \
  GID=1000 \
  LOAD_ON_START="true" \
  AUTOCREATE_LANGUAGES="true" \
- AUTOCREATE_VALIDATED_MODELS="true" \
- TRANSFORMERS_CACHE="/models" \
- TRANSFORMERS_OFFLINE="0" \
  DEVICE="cpu" \
- NUM_WEB_WORKERS="1" \
+ OCT_GUNICORN_NUM_WORKERS="1" \
  NUM_MAIN_WORKERS="4" \
  NUM_BOX_WORKERS="1" \
  NUM_OCR_WORKERS="1" \
@@ -70,14 +64,13 @@ ENV \
  DJANGO_SUPERUSER_USERNAME="" \
  DJANGO_SUPERUSER_PASSWORD="" \
  DATABASE_ENGINE="django.db.backends.sqlite3" \
- DATABASE_NAME="/data/db.sqlite3" \
+ DATABASE_NAME="/db_data/db.sqlite3" \
  DATABASE_HOST="" \
  DATABASE_PORT="" \
  DATABASE_USER="" \
  DATABASE_PASSWORD=""
 
-VOLUME [ "/models" ]
-VOLUME [ "/data" ]
+VOLUME plugin_data, models, db_data
 
 WORKDIR /opt/app
 

diff --git a/Dockerfile-gpu b/Dockerfile-gpu