The goal is to assign one of the following 12 labels to each command: yes, no, up, down, left, right, on, off, stop, go, silence, unknown.
I used mel-scaled spectrograms and mel-frequency cepstral coefficients as inputs for two NASNet-A Convolutional Neural Networks and then averaged their predictions.
I used PyTorch to train two NASNet-A Convolutional Neural Networks. The First network was trained on mel-scaled spectrograms, the second - on mel-frequency cepstral coefficients. Then I averaged their predictions to make a final submission.
Examples of mel-scaled spectrograms for speech commands:
- Python 3.6
- PyTorch 0.4.0a0. If you want to use version 0.3, you need to modify train.py and predict.py files:
- remove
with torch.no_grad
and passvolatile=True
when you are creating Variables while runningforward_pass
in a validation mode). - remove
with torch.no_grad
and passvolatile=True
when you are creating a Variable.
- remove
- Libraries from requirements.txt
- Adjust config variables in config.py
- Execute run.sh file