Finetune XTTS for new languages #3992

anhnh2002 · 2024-09-08T08:18:10Z

Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.

https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages

jamestech-cmyk · 2024-09-08T11:40:36Z

Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.

https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages

Hello, man. I'm very pleased with your contribution. Can you provide your trained models? I want to check if they are working well.

anhnh2002 · 2024-09-08T12:49:37Z

Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.
https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages

Hello, man. I'm very pleased with your contribution. Can you provide your trained models? I want to check if they are working well.

Due to copyright issues, I am currently unable to share the model's weights with you. I apologize for the inconvenience.

jamestech-cmyk · 2024-09-08T12:54:40Z

How long did it take you to train 100 hours of audio, and can you tell me your current computer configuration?

anhnh2002 · 2024-09-08T12:57:23Z

it took over 8 hours to train 100 hours of audio on single A100 40Gb

How long did it take you to train 100 hours of audio, and can you tell me your current computer configuration?

mohataher · 2024-10-10T22:34:41Z

Due to copyright issues, I am currently unable to share the model's weights with you.

Understandable. However, will you be able to share a snippet audio of what the model has produced?

anhnh2002 · 2024-10-13T11:13:13Z

Due to copyright issues, I am currently unable to share the model's weights with you.

Understandable. However, will you be able to share a snippet audio of what the model has produced?

Please find the relevant file at the following Google Drive link:

View File

rose07 · 2024-10-15T03:06:48Z

https://tts.byylook.com/ai/text-to-speech

developeranalyser · 2024-10-18T11:32:53Z

Due to copyright issues, I am currently unable to share the model's weights with you.

Understandable. However, will you be able to share a snippet audio of what the model has produced?

Please find the relevant file at the following Google Drive link:

View File

hi man
what your taken loass ?
and how many step ?

Is it possible to train the xttsv2 model for about 10 hours and can it work well only based on these 10 hours?

Actually, I trained the model with your code and reached a loss of 0.5 and used the model and the output was very bad and nothing was audible. I used google/fleurs dataset for Farsi language.
First, I expanded vocab, then dave training, and then model training for 10,000 steps
What do you think, why am I getting so bad results?

Thank you very much

anhnh2002 · 2024-10-18T11:41:48Z

Due to copyright issues, I am currently unable to share the model's weights with you.

Understandable. However, will you be able to share a snippet audio of what the model has produced?

Please find the relevant file at the following Google Drive link:
View File

hi man what your taken loass ? and how many step ?

Is it possible to train the xttsv2 model for about 10 hours and can it work well only based on these 10 hours?

Actually, I trained the model with your code and reached a loss of 0.5 and used the model and the output was very bad and nothing was audible. I used google/fleurs dataset for Farsi language. First, I expanded vocab, then dave training, and then model training for 10,000 steps What do you think, why am I getting so bad results?

Thank you very much

First, I recommend you do not train DVAE (because you have a small amount of data). And I think 10 hours is not enough; it makes the model overfit with your data. The losses I got are about 0.8.

developeranalyser · 2024-10-18T18:19:29Z

thanks for your good job and reply
i do that and
loss :
| > avg_loader_time: 0.18475866317749023 (+0.00680994987487793)
| > avg_loss_text_ce: 0.036836352199316025 (-0.0016442164778709412)
| > avg_loss_mel_ce: 0.03139156103134155 (-0.001425366848707199)
| > avg_loss: 0.06822791695594788 (-0.003069579601287842)

but after inference
one of sentence that trained on
i get worse audio that not in trained lang
And even the sound that is produced is not close to the trained language at all

result.zip
result.zip

developeranalyser · 2024-10-18T18:43:03Z

How many epochs and steps are required for training on 100 hours of data? And it took a few hours my friend

kunibald413 · 2024-10-18T20:00:29Z

Hi, nice work!
You might want to try to create a merge request for it into a still maintained fork of coqui-ai: https://github.com/idiap/coqui-ai-TTS

I'm not involved with it, just an idea.

anhnh2002 · 2024-10-19T05:58:51Z

How many epochs and steps are required for training on 100 hours of data? And it took a few hours my friend

2 epochs work well for me

developeranalyser · 2024-10-19T10:24:41Z

2 epochs work well for me
for new lang , after train we need train vocoder ?

and If lose decreases and becomes less than 1, but it still reads the text incorrectly, what is your opinion about this? What do you advise me to do to solve this problem, maybe my important problem is solved
thank you

developeranalyser · 2024-10-19T10:54:03Z

I don't want to train the model on the whole language

I want to teach on limited sentences of a new language
For example, on 1000 sentences
What is your opinion about this??? Is it possible??

anhnh2002 · 2024-10-19T11:19:46Z

I don't want to train the model on the whole language

I want to teach on limited sentences of a new language

For example, on 1000 sentences

What is your opinion about this??? Is it possible??

I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.

developeranalyser · 2024-10-19T12:18:19Z

I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.

Thank you very much, so your opinion is that my problem is the small amount of data and I cannot get good results from this model that I have trained on few sentences and it must be trained on a large amount of data.
I expanded vocab and taught dave
Honestly, I wanted to test first that the model is trained on little data and how the result will be, then run it on a lot of data.
Another question I have is how much lr should I put??? That the learning of the model for other languages is not lost and that the model learns well and quickly for a new language and on a lot of data

Thank you for paying the zakat of your knowledge :)

developeranalyser · 2024-10-20T10:48:59Z

I don't want to train the model on the whole language

I want to teach on limited sentences of a new language
For example, on 1000 sentences
What is your opinion about this??? Is it possible??

I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.

In short, teaching a language with 10 letters and about 100 sentences is not possible? So that the model reads these 100 trained sentences correctly?

anhnh2002 added the feature request feature requests for making TTS better. label Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune XTTS for new languages #3992

Finetune XTTS for new languages #3992

anhnh2002 commented Sep 8, 2024 •

edited

Loading

jamestech-cmyk commented Sep 8, 2024

anhnh2002 commented Sep 8, 2024

jamestech-cmyk commented Sep 8, 2024

anhnh2002 commented Sep 8, 2024 •

edited

Loading

mohataher commented Oct 10, 2024

anhnh2002 commented Oct 13, 2024

rose07 commented Oct 15, 2024

developeranalyser commented Oct 18, 2024

anhnh2002 commented Oct 18, 2024

developeranalyser commented Oct 18, 2024

developeranalyser commented Oct 18, 2024

kunibald413 commented Oct 18, 2024

anhnh2002 commented Oct 19, 2024

developeranalyser commented Oct 19, 2024

developeranalyser commented Oct 19, 2024

anhnh2002 commented Oct 19, 2024

I don't want to train the model on the whole language

developeranalyser commented Oct 19, 2024

developeranalyser commented Oct 20, 2024

I don't want to train the model on the whole language

Finetune XTTS for new languages #3992

Finetune XTTS for new languages #3992

Comments

anhnh2002 commented Sep 8, 2024 • edited Loading

jamestech-cmyk commented Sep 8, 2024

anhnh2002 commented Sep 8, 2024

jamestech-cmyk commented Sep 8, 2024

anhnh2002 commented Sep 8, 2024 • edited Loading

mohataher commented Oct 10, 2024

anhnh2002 commented Oct 13, 2024

rose07 commented Oct 15, 2024

developeranalyser commented Oct 18, 2024

anhnh2002 commented Oct 18, 2024

developeranalyser commented Oct 18, 2024

developeranalyser commented Oct 18, 2024

kunibald413 commented Oct 18, 2024

anhnh2002 commented Oct 19, 2024

developeranalyser commented Oct 19, 2024

developeranalyser commented Oct 19, 2024

I don't want to train the model on the whole language

anhnh2002 commented Oct 19, 2024

I don't want to train the model on the whole language

developeranalyser commented Oct 19, 2024

developeranalyser commented Oct 20, 2024

I don't want to train the model on the whole language

anhnh2002 commented Sep 8, 2024 •

edited

Loading

anhnh2002 commented Sep 8, 2024 •

edited

Loading