Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection strategies for OCR processing #174

Open
alliomeria opened this issue Apr 20, 2022 · 0 comments
Open

Language detection strategies for OCR processing #174

alliomeria opened this issue Apr 20, 2022 · 0 comments
Assignees
Labels
AI - Machine Learning Ok, I have no description for this Discovery Find what is in your soul enhancement New feature or request

Comments

@alliomeria
Copy link
Contributor

alliomeria commented Apr 20, 2022

What is needed?

Logical, iterative approach for detecting the Language to be used during OCR processing.

How could this work?

(*Adapted from Archipelago Slack discussion, 2022-04-20)

  1. Generate OCR based on user input (passed via an ADO's metadata).
  2. If NLP is selected, detect the language from the pure text language.
  3. If language detected is similar, pass; If not, re-OCR using the correct language and Log.
    • Possible option: Add a checkbox that says "Trust language detection" (in case someone really wants to avoid this at all)
  4. With Old OCR (if correct) or New OCR (if was wrong) and language detected, check NLP-enabled languages. If language is there, then to NLP polyglot. If not present, simply skip.
  5. For all cases, the original language passed and the one detected will be part of the resulting Solr doc.

To Dos:

@DiegoPino & @mbennett-uoe, does this cover the highlight reel?

@alliomeria alliomeria added enhancement New feature or request Discovery Find what is in your soul AI - Machine Learning Ok, I have no description for this labels Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI - Machine Learning Ok, I have no description for this Discovery Find what is in your soul enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants