-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add b2ai speaker verification functions #87
Conversation
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #87 +/- ##
==========================================
+ Coverage 63.04% 63.33% +0.28%
==========================================
Files 63 65 +2
Lines 2073 2122 +49
==========================================
+ Hits 1307 1344 +37
- Misses 766 778 +12 ☔ View full report in Codecov by Sentry. |
@fabiocat93 would you mind reviewing this? It's the last component of b2aiprep's process module that needs to be incorporated into senselab as far as I can tell. It would also be great if we could do a minor release when we merge it to facilitate integration into b2aiprep |
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
hi @ibevers , i finally got the chance to review your code. i have left some comments |
…ing is not necessary since low pass filtering is done by default in torchaudio
@fabiocat93 I incorporated your feedback. Hopefully, this is mergable now:) |
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
def verify_speaker( | ||
audios: List[Tuple[Audio, Audio]], | ||
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"), | ||
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a user parameter, but a configuration variable you should get from the model configuration. please, see https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speech_enhancement/speechbrain.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the training sample rate is not available from the model configuration for speechbrain/spkrec-ecapa-voxceleb
.
I checked the output of the instance methods of model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main")
(e.g. model.get_model_info()
).
I also get an error when I run:
def get_model_sample_rate(model, device=DeviceType.CPU):
enhancer = SpeechBrainEnhancer._get_speechbrain_model(model=model, device=device)
return enhancer.hparams.sample_rate
# Usage example
model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main")
sample_rate = get_model_sample_rate(model)
print(f"The model's sample rate is: {sample_rate} Hz")
Error output:
'types.SimpleNamespace' object has no attribute 'sample_rate'
However, the above code works when I run it with speechbrain/sepformer-wham16k-enhancement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are so right. in the other use case when we use ecapa-tdnn we fixed expected_sampling_rate
to 16khz in the code (https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speaker_embeddings/speechbrain.py), since all speechbrain models for speaker embeddings work at 16khz. it's not ideal, but a good workaround. I think you can hardcode 16khz in the code, too and remove it from the params. we want to remove any chance for the user to make silly mistakes. thanks!! and nice catch!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
audios: List[Tuple[Audio, Audio]], | ||
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"), | ||
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio | ||
device: DeviceType = DeviceType.CPU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please make device's default always as None and use our _select_device_and_dtype method? this way, if the user doesn't have any preference and a GPU is available, it will be used. in your code, you force the user to use a CPU no matter what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With MPS, was getting:
ValueError: The requested DeviceType is either not available or compatible with this functionality.
src/senselab/utils/data_structures/device.py:60: ValueError
I excluded it from compatible_devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU
. am i missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see it now?
|
||
Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any | ||
specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate) | ||
def resample_audios( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks you @ibevers . this is helpful. i did some further research and i think we should go with an alternative implementation that is not yours or mine, but that from transforms.Resample. transforms.resample is not that different from functionals.resample, but it precomputes and reuses the resampling kernel, so using it will result in more efficient computation if resampling multiple waveforms with the same resampling parameters. they both internally do the butterworth filtering for anti-aliasing - which is why your method and my method are redundant in the same wrapping function - and then resample the signal. we can pass order and lowcut as param and compute roll off by ourselves. I would appreciate if you could refactor the code.
reference: https://pytorch.org/audio/main/generated/torchaudio.transforms.Resample.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an fyi that transforms.resample does not use butterworth filtering. it uses sinc interpolation with a hamming or kaiser window. in at least my initial antiliasing tests with fixed sinuisoids it was not great at creating a good filter. it still passed some amount of the signal through. i can't find the notebook right this minute, but the general idea of the test is:
create sinusoid at 14 KHz sampled at 48K, then filter down to samping rate of 16K. you should not see on an FFT any signal peak at 2K (the aliased signal). if you do, that means the anti-aliasing filter is not doing a good job. anything that far from nyquist (8K) should be completely filtered out.
hence it may make sense to have multiple resamplers still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hash this out in #90 and leave the resampling the way it is for this PR? @fabiocat93 @satra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Almost! but we are very close... i feel so sorry!! |
@fabiocat93 I made some changes and responded to some of your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99% done 👍
audios: List[Tuple[Audio, Audio]], | ||
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"), | ||
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio | ||
device: DeviceType = DeviceType.CPU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU
. am i missing something?
|
||
Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any | ||
specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate) | ||
def resample_audios( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once
src/senselab/audio/tasks/speaker_verification/speaker_verification.py
Outdated
Show resolved
Hide resolved
@fabiocat93 I made another round of updates. Is it mergable now? |
No description provided.