Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

Open
empz opened this issue Oct 1, 2024 · 2 comments

Comments

@empz
Copy link

empz commented Oct 1, 2024

I don't know much about ML but I was able to use the following tutorial to do force aligment on multilingual transcription. The only requirement is to romanize the transcript which I did with the uroman package.
https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html

According to that tutorial, it uses the Wav2Vec2 model to do this and I successfully aligned multiple languages. There's an extra step involved in mapping the aligned words back to the original word (non-romanized), but that's pretty much it.

Thoughts?

@andriken
Copy link

which model did you used can you tell me how to do this I wanna do it for Japanese language, because none of the japanese wav2vec2 I found working the english one works best, so it would be helpful if you share how did you used the multilingual one.

@MahmoudAshraf97
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants