24yo deep learning engineer cooking the anti-entropy machine.
I believe that open source is extremely important in AI and more generally in software.
- 🟩 Resolve Inference Selection Bug Affecting Transcription Quality: When different temperatures are considered, the current code is returning the sample computed with the highest temperature. I suggested changing this behavior to return the sample that has the best logprob. Even tho this PR wasn't merged in the official repo, the idea has been adopted and improved since in the faster-whisper repository with this PR
- 🟪 Avoid computing higher temperatures on no_speech segments: In Whisper, the voice activity detection token is computed before decoding the actual transcribed sentence. I realized that in the code, the sentence was computed multiple times with different temperatures unnecessarily in the case where the segment was silent.