Wav2Vec2 upgrade with Conv1D options #1758

homink · 2024-08-13T20:39:51Z

This PR enhances the efficiency of Wav2Vec2 inference within the CTranslate2 framework, specifically improving both speed and memory usage. Compared to the HuggingFace implementation, the int8 quantized model shows an 11% increase in speed and a 70% reduction in memory usage on GPU, as well as a 5% increase in speed and a 71% reduction in memory usage on CPU when processing 300 audio files. Additionally, using an N-gram language model with pyctcdecode further can improve the speech recognition accuracy. My environment includes an NVIDIA GeForce RTX 3090 24GB with CUDA 12.4, torch==2.12+cu12.1, and transformers==4.41.0. Special thanks for the Depthwise convolution process introduced in #1749.

homink · 2024-08-14T03:36:29Z

@minhthuc2502 could you please have a look at this PR and merge it?

src/layers/wav2vec2.cc

homink · 2024-08-15T17:38:56Z

@minhthuc2502 Thank you for your suggestions. I agree that the changes were redundant. I've updated the code as you recommended and made the necessary commits to ensure all checks pass. It appears that the network conditions for build-and-push-docker-images are more favorable in the morning where I am located.

minhthuc2502 · 2024-08-19T12:20:17Z

Hello, Thank you for your updates. I'll merge this. I agree that there is some network problems with build-and-push-docker-images , currently, I have rerun it manually. Have to fix this in the future.

hkwon added 6 commits August 13, 2024 13:30

Wav2Vec2 upgrade with Conv1D options

2768558

refining scripts

7d0513c

refining script again

2d4670f

fix the formats

85654ec

fix the isort format

0b125a9

refining the library

d844362

minhthuc2502 reviewed Aug 14, 2024

View reviewed changes

src/layers/wav2vec2.cc Outdated Show resolved Hide resolved

src/layers/wav2vec2.cc Outdated Show resolved Hide resolved

hkwon added 9 commits August 14, 2024 11:01

update based on the suggestions

a1c112f

update the variable name

e5a2a46

adding unk_token removal for the Python testing

065b240

adding whitespace

785a0ad

update Python format

4c2b38c

update variables

47f38a7

update variables

ae69b23

update variables

62d8799

update variables

3d76464

minhthuc2502 merged commit 8ba828c into OpenNMT:master Aug 19, 2024
13 checks passed

homink mentioned this pull request Oct 28, 2024

Backward compatibility for the Wav2Vec2 ASR model #1810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wav2Vec2 upgrade with Conv1D options #1758

Wav2Vec2 upgrade with Conv1D options #1758

homink commented Aug 13, 2024 •

edited

Loading

homink commented Aug 14, 2024

homink commented Aug 15, 2024

minhthuc2502 commented Aug 19, 2024

Wav2Vec2 upgrade with Conv1D options #1758

Wav2Vec2 upgrade with Conv1D options #1758

Conversation

homink commented Aug 13, 2024 • edited Loading

homink commented Aug 14, 2024

homink commented Aug 15, 2024

minhthuc2502 commented Aug 19, 2024

homink commented Aug 13, 2024 •

edited

Loading