Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English text looks strange despite having Tesseract #109

Open
Hamid1376 opened this issue Aug 7, 2024 · 1 comment
Open

English text looks strange despite having Tesseract #109

Hamid1376 opened this issue Aug 7, 2024 · 1 comment

Comments

@Hamid1376
Copy link

First of all, thank you for creating this Amazing software, It saves me a lot of time for translating Manga.
but for translating Comics, I did install Tesseract-OCR, but I can't see any option for switching to English Ocr and any comic I put it in there is comes out jibberish. sry if this is not a real issue, I'm just really not tech savvy.

@VoxelCubes
Copy link
Owner

You will additionally need to enable tesseract in your current profile, in the preprocessor settings. Don't forget to hit apply after making changes in the profile.

Tesseract is optional and isn't so good with ALL CAPS so it's off by default. There is the problem with relying on the text detector to figure out what language a bubble is. It can only detect japanese and english as languages, but can still recognize latin text, so it calls spanish english, usually.
I will add a language override in the next release so you can tell it what language to use, ignoring the detected language. That way spanish and maybe chinese should become supported by tesseract. (I'll work on that in September)

Civvic is also experimenting with visual LLMs that are remarkably good at OCR (both local and api-based) which will open the door to much better OCR in the future.

Until then, you can also manually correct OCR with the review mode, which is on by default. That's new in the latest version.

Good luck, glad to hear it's been helpful.

@VoxelCubes VoxelCubes changed the title Can't use English Software English text looks strange despite having Tesseract Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants