-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wav2Vec2 upgrade with Conv1D options #1758
Conversation
@minhthuc2502 could you please have a look at this PR and merge it? |
@minhthuc2502 Thank you for your suggestions. I agree that the changes were redundant. I've updated the code as you recommended and made the necessary commits to ensure all checks pass. It appears that the network conditions for build-and-push-docker-images are more favorable in the morning where I am located. |
Hello, Thank you for your updates. I'll merge this. I agree that there is some network problems with build-and-push-docker-images , currently, I have rerun it manually. Have to fix this in the future. |
This PR enhances the efficiency of Wav2Vec2 inference within the CTranslate2 framework, specifically improving both speed and memory usage. Compared to the HuggingFace implementation, the int8 quantized model shows an 11% increase in speed and a 70% reduction in memory usage on GPU, as well as a 5% increase in speed and a 71% reduction in memory usage on CPU when processing 300 audio files. Additionally, using an N-gram language model with pyctcdecode further can improve the speech recognition accuracy. My environment includes an NVIDIA GeForce RTX 3090 24GB with CUDA 12.4, torch==2.12+cu12.1, and transformers==4.41.0. Special thanks for the Depthwise convolution process introduced in #1749.