Skip to content

Modèle Français 0.6

Pre-release
Pre-release
Compare
Choose a tag to compare
@lissyx lissyx released this 11 Dec 08:15
· 6 commits to master since this release
5a0f61b

Jeux de données :

  • Lingua Libre (~40h)
  • Common Voice FR (v5.1) (~490h, en autorisant jusqu'à 32 duplicatas)
  • Training Speech (~180h)
  • African Accented French (~15h)
  • M-AILABS French (~315h)
  • Centre de Conférence Pierre Mendès France (~300h)

Total : ~1340h

Paramètres :

  • EPOCHS=32
  • LEARNING_RATE=0.0001
  • DROPOUT=0.3
  • BATCH_SIZE=64
  • LM_ALPHA=0.5919543900530122
  • LM_BETA=1.6082513974258137
Best params: lm_alpha=0.5919543900530122 and lm_beta=1.6082513974258137 with WER=0.29113864240896115

Language Model : dump wikipedia + dump débats assemblée nationale.

Licence : MPL 2.0 https://github.com/common-voice/commonvoice-fr/blob/5699e59244d14bb14d5b7603b91c934b761c9194/DeepSpeech/LICENSE.txt

Fonctionne avec DeepSpeech v0.7, v0.8, v0.9.

Résultats test set:

Test on /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_test.csv - WER: 0.448976, CER: 0.242144, loss: 43.320114
Test on /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_test.csv - WER: 0.097462, CER: 0.027961, loss: 12.057733
Test on /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_test.csv - WER: 0.199902, CER: 0.059519, loss: 15.992792
Test on /mnt/extracted/data/cv-fr/clips/test.csv - WER: 0.301279, CER: 0.142777, loss: 37.710129
Test on /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_test.csv - WER: 0.589222, CER: 0.182179, loss: 7.118075
Test on /mnt/extracted/data/ccpmf/transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020/ccpmf_test.csv - WER: 0.486848, CER: 0.304395, loss: 89.443710