Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

empz · 2024-10-01T06:08:37Z

I don't know much about ML but I was able to use the following tutorial to do force aligment on multilingual transcription. The only requirement is to romanize the transcript which I did with the uroman package.
https://pytorch.org/audio/stable/tutorials/forced_alignment_for_multilingual_data_tutorial.html

According to that tutorial, it uses the Wav2Vec2 model to do this and I successfully aligned multiple languages. There's an extra step involved in mapping the aligned words back to the original word (non-romanized), but that's pretty much it.

Thoughts?

The text was updated successfully, but these errors were encountered:

andriken · 2024-10-10T20:35:41Z

which model did you used can you tell me how to do this I wanna do it for Japanese language, because none of the japanese wav2vec2 I found working the english one works best, so it would be helpful if you share how did you used the multilingual one.

MahmoudAshraf97 · 2024-10-14T12:01:07Z

You can check https://github.com/MahmoudAshraf97/ctc-forced-aligner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

empz commented Oct 1, 2024

andriken commented Oct 10, 2024

MahmoudAshraf97 commented Oct 14, 2024

Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

Why can't we do multilanguage forced aligment without loading a language-specific alignment model? #893

Comments

empz commented Oct 1, 2024

andriken commented Oct 10, 2024

MahmoudAshraf97 commented Oct 14, 2024