Add b2ai speaker verification functions #87

ibevers · 2024-07-05T18:34:29Z

No description provided.

ibevers · 2024-07-05T18:39:34Z

update to not use big models in remote test
update tests to use the audio mono audio fixture

codecov-commenter · 2024-07-06T19:00:01Z

Codecov Report

Attention: Patch coverage is 54.71698% with 24 lines in your changes missing coverage. Please review.

Project coverage is 63.33%. Comparing base (f3c595f) to head (d890ea7).
Report is 1 commits behind head on main.

Files	Patch %	Lines
...tasks/speaker_verification/speaker_verification.py	36.36%	14 Missing ⚠️
src/tests/audio/tasks/speaker_verification_test.py	37.50%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   63.04%   63.33%   +0.28%     
==========================================
  Files          63       65       +2     
  Lines        2073     2122      +49     
==========================================
+ Hits         1307     1344      +37     
- Misses        766      778      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ibevers · 2024-07-08T18:46:26Z

@fabiocat93 would you mind reviewing this? It's the last component of b2aiprep's process module that needs to be incorporated into senselab as far as I can tell. It would also be great if we could do a minor release when we merge it to facilitate integration into b2aiprep

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

src/tests/audio/tasks/speaker_verification_test.py

fabiocat93 · 2024-07-10T13:06:07Z

hi @ibevers , i finally got the chance to review your code. i have left some comments

…ing is not necessary since low pass filtering is done by default in torchaudio

ibevers · 2024-07-11T01:23:20Z

@fabiocat93 I incorporated your feedback. Hopefully, this is mergable now:)

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

fabiocat93 · 2024-07-11T12:50:12Z

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

+def verify_speaker(
+    audios: List[Tuple[Audio, Audio]],
+    model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
+    model_training_sample_rate: int = 16000,  # spkrec-ecapa-voxceleb trained on 16kHz audio


this is not a user parameter, but a configuration variable you should get from the model configuration. please, see https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speech_enhancement/speechbrain.py

I'm pretty sure the training sample rate is not available from the model configuration for speechbrain/spkrec-ecapa-voxceleb.

I checked the output of the instance methods of model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main") (e.g. model.get_model_info()).

I also get an error when I run:

def get_model_sample_rate(model, device=DeviceType.CPU): enhancer = SpeechBrainEnhancer._get_speechbrain_model(model=model, device=device) return enhancer.hparams.sample_rate # Usage example model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main") sample_rate = get_model_sample_rate(model) print(f"The model's sample rate is: {sample_rate} Hz")

Error output:

'types.SimpleNamespace' object has no attribute 'sample_rate'

However, the above code works when I run it with speechbrain/sepformer-wham16k-enhancement.

you are so right. in the other use case when we use ecapa-tdnn we fixed expected_sampling_rate to 16khz in the code (https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speaker_embeddings/speechbrain.py), since all speechbrain models for speaker embeddings work at 16khz. it's not ideal, but a good workaround. I think you can hardcode 16khz in the code, too and remove it from the params. we want to remove any chance for the user to make silly mistakes. thanks!! and nice catch!!

Thanks! Done.

fabiocat93 · 2024-07-11T12:52:01Z

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

+    audios: List[Tuple[Audio, Audio]],
+    model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
+    model_training_sample_rate: int = 16000,  # spkrec-ecapa-voxceleb trained on 16kHz audio
+    device: DeviceType = DeviceType.CPU,


can you please make device's default always as None and use our _select_device_and_dtype method? this way, if the user doesn't have any preference and a GPU is available, it will be used. in your code, you force the user to use a CPU no matter what

With MPS, was getting:

ValueError: The requested DeviceType is either not available or compatible with this functionality. src/senselab/utils/data_structures/device.py:60: ValueError

I excluded it from compatible_devices.

yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU. am i missing something?

Can you see it now?

fabiocat93 · 2024-07-11T13:11:18Z

src/senselab/audio/tasks/preprocessing/preprocessing.py

-
-    Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any
-    specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate)
+def resample_audios(


thanks you @ibevers . this is helpful. i did some further research and i think we should go with an alternative implementation that is not yours or mine, but that from transforms.Resample. transforms.resample is not that different from functionals.resample, but it precomputes and reuses the resampling kernel, so using it will result in more efficient computation if resampling multiple waveforms with the same resampling parameters. they both internally do the butterworth filtering for anti-aliasing - which is why your method and my method are redundant in the same wrapping function - and then resample the signal. we can pass order and lowcut as param and compute roll off by ourselves. I would appreciate if you could refactor the code.
reference: https://pytorch.org/audio/main/generated/torchaudio.transforms.Resample.html

just an fyi that transforms.resample does not use butterworth filtering. it uses sinc interpolation with a hamming or kaiser window. in at least my initial antiliasing tests with fixed sinuisoids it was not great at creating a good filter. it still passed some amount of the signal through. i can't find the notebook right this minute, but the general idea of the test is:

create sinusoid at 14 KHz sampled at 48K, then filter down to samping rate of 16K. you should not see on an FFT any signal peak at 2K (the aliased signal). if you do, that means the anti-aliasing filter is not doing a good job. anything that far from nyquist (8K) should be completely filtered out.

hence it may make sense to have multiple resamplers still.

Can we hash this out in #90 and leave the resampling the way it is for this PR? @fabiocat93 @satra

i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once

fabiocat93 · 2024-07-11T13:13:25Z

@fabiocat93 I incorporated your feedback. Hopefully, this is mergable now:)

Almost! but we are very close... i feel so sorry!!

ibevers · 2024-07-11T16:31:16Z

@fabiocat93 I made some changes and responded to some of your feedback

fabiocat93

99% done 👍

fabiocat93 · 2024-07-11T18:23:19Z

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

+    audios: List[Tuple[Audio, Audio]],
+    model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
+    model_training_sample_rate: int = 16000,  # spkrec-ecapa-voxceleb trained on 16kHz audio
+    device: DeviceType = DeviceType.CPU,


yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU. am i missing something?

fabiocat93 · 2024-07-11T18:25:39Z

src/senselab/audio/tasks/preprocessing/preprocessing.py

-
-    Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any
-    specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate)
+def resample_audios(


i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once

src/senselab/audio/tasks/speaker_verification/speaker_verification.py

ibevers · 2024-07-11T20:52:48Z

@fabiocat93 I made another round of updates. Is it mergable now?

Add b2ai speaker verification functions

333d6a1

ibevers linked an issue Jul 5, 2024 that may be closed by this pull request

Task: add b2ai speaker verification functions #86

Closed

2 tasks

Add conditional to prevent large model run on GitHub

b170c3e

ibevers added 3 commits July 6, 2024 15:54

Get tests running correctly

ee9640f

Add asserts to tests

b9357e3

Reformat

e6f39eb

ibevers marked this pull request as ready for review July 6, 2024 20:03

ibevers added enhancement New feature or request minor Minor release and removed enhancement New feature or request minor Minor release labels Jul 8, 2024

ibevers self-assigned this Jul 9, 2024

ibevers requested a review from fabiocat93 July 9, 2024 18:00

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 10, 2024

View reviewed changes

src/tests/audio/tasks/speaker_verification_test.py Outdated Show resolved Hide resolved

fabiocat93 added enhancement New feature or request release minor Minor release labels Jul 10, 2024

ibevers added 2 commits July 10, 2024 19:33

Add support for IIR filter-based resampling to resample_audios

0a450ce

Update to use new resample method, SpeechBrainModel, and DeviceType

7967c39

ibevers added 5 commits July 10, 2024 20:16

Remove resampling code--assuming that the low pass in the IIR resampl…

4aec278

…ing is not necessary since low pass filtering is done by default in torchaudio

Add checks for model training sample rate.

ccbbdcf

Update to use SpeechBrainEmbeddings, cosine_similarity, and a threshold

b55e22d

Use mono audio fixture in test

5940a26

Make input a list of tuples of audios

a45d12e

fabiocat93 reviewed Jul 11, 2024

View reviewed changes

src/senselab/audio/tasks/speaker_verification/speaker_verification.py Outdated Show resolved Hide resolved

fabiocat93 reviewed Jul 11, 2024

View reviewed changes

ibevers added 2 commits July 11, 2024 10:38

Update threshold to .25 to match speechbrain.inference.speaker

6283e55

Change default device to None

20859c8

fabiocat93 reviewed Jul 11, 2024

View reviewed changes

ibevers added 4 commits July 11, 2024 15:19

Make training sample rate a constant

97ce40e

Remove torchaudio from resampling

eafefd9

Call _select_device_and_dtype in every case

62c95b4

Update resampling to handle stereo audio

d890ea7

fabiocat93 changed the base branch from main to release_060 July 11, 2024 21:52

fabiocat93 removed release minor Minor release labels Jul 11, 2024

fabiocat93 merged commit b26be06 into release_060 Jul 11, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add b2ai speaker verification functions #87

Add b2ai speaker verification functions #87

ibevers commented Jul 5, 2024

ibevers commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 6, 2024 •

edited

Loading

ibevers commented Jul 8, 2024

fabiocat93 commented Jul 10, 2024

ibevers commented Jul 11, 2024 •

edited

Loading

fabiocat93 Jul 11, 2024

ibevers Jul 11, 2024

fabiocat93 Jul 11, 2024

ibevers Jul 11, 2024

fabiocat93 Jul 11, 2024

ibevers Jul 11, 2024

fabiocat93 Jul 11, 2024

ibevers Jul 11, 2024

fabiocat93 Jul 11, 2024

satra Jul 11, 2024

ibevers Jul 11, 2024 •

edited

Loading

fabiocat93 Jul 11, 2024

ibevers Jul 11, 2024

fabiocat93 commented Jul 11, 2024

ibevers commented Jul 11, 2024

fabiocat93 left a comment

fabiocat93 Jul 11, 2024

fabiocat93 Jul 11, 2024

ibevers commented Jul 11, 2024

Add b2ai speaker verification functions #87

Add b2ai speaker verification functions #87

Conversation

ibevers commented Jul 5, 2024

ibevers commented Jul 5, 2024 • edited Loading

codecov-commenter commented Jul 6, 2024 • edited Loading

Codecov Report

ibevers commented Jul 8, 2024

fabiocat93 commented Jul 10, 2024

ibevers commented Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibevers Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabiocat93 commented Jul 11, 2024

ibevers commented Jul 11, 2024

fabiocat93 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibevers commented Jul 11, 2024

ibevers commented Jul 5, 2024 •

edited

Loading

codecov-commenter commented Jul 6, 2024 •

edited

Loading

ibevers commented Jul 11, 2024 •

edited

Loading

ibevers Jul 11, 2024 •

edited

Loading