Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add b2ai speaker verification functions #87

Merged
merged 18 commits into from
Jul 11, 2024

Conversation

ibevers
Copy link
Collaborator

@ibevers ibevers commented Jul 5, 2024

No description provided.

@ibevers ibevers linked an issue Jul 5, 2024 that may be closed by this pull request
2 tasks
@ibevers
Copy link
Collaborator Author

ibevers commented Jul 5, 2024

  • update to not use big models in remote test
  • update tests to use the audio mono audio fixture

@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2024

Codecov Report

Attention: Patch coverage is 54.71698% with 24 lines in your changes missing coverage. Please review.

Project coverage is 63.33%. Comparing base (f3c595f) to head (d890ea7).
Report is 1 commits behind head on main.

Files Patch % Lines
...tasks/speaker_verification/speaker_verification.py 36.36% 14 Missing ⚠️
src/tests/audio/tasks/speaker_verification_test.py 37.50% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   63.04%   63.33%   +0.28%     
==========================================
  Files          63       65       +2     
  Lines        2073     2122      +49     
==========================================
+ Hits         1307     1344      +37     
- Misses        766      778      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ibevers ibevers marked this pull request as ready for review July 6, 2024 20:03
@ibevers ibevers added enhancement New feature or request minor Minor release and removed enhancement New feature or request minor Minor release labels Jul 8, 2024
@ibevers
Copy link
Collaborator Author

ibevers commented Jul 8, 2024

@fabiocat93 would you mind reviewing this? It's the last component of b2aiprep's process module that needs to be incorporated into senselab as far as I can tell. It would also be great if we could do a minor release when we merge it to facilitate integration into b2aiprep

@ibevers ibevers self-assigned this Jul 9, 2024
@ibevers ibevers requested a review from fabiocat93 July 9, 2024 18:00
@fabiocat93
Copy link
Collaborator

hi @ibevers , i finally got the chance to review your code. i have left some comments

@fabiocat93 fabiocat93 added enhancement New feature or request release minor Minor release labels Jul 10, 2024
@ibevers
Copy link
Collaborator Author

ibevers commented Jul 11, 2024

@fabiocat93 I incorporated your feedback. Hopefully, this is mergable now:)

def verify_speaker(
audios: List[Tuple[Audio, Audio]],
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a user parameter, but a configuration variable you should get from the model configuration. please, see https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speech_enhancement/speechbrain.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the training sample rate is not available from the model configuration for speechbrain/spkrec-ecapa-voxceleb.

I checked the output of the instance methods of model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main") (e.g. model.get_model_info()).

I also get an error when I run:

def get_model_sample_rate(model, device=DeviceType.CPU):
    enhancer = SpeechBrainEnhancer._get_speechbrain_model(model=model, device=device)
    return enhancer.hparams.sample_rate

# Usage example
model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main")
sample_rate = get_model_sample_rate(model)
print(f"The model's sample rate is: {sample_rate} Hz")

Error output:

'types.SimpleNamespace' object has no attribute 'sample_rate'

However, the above code works when I run it with speechbrain/sepformer-wham16k-enhancement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are so right. in the other use case when we use ecapa-tdnn we fixed expected_sampling_rate to 16khz in the code (https://github.com/sensein/senselab/blob/main/src/senselab/audio/tasks/speaker_embeddings/speechbrain.py), since all speechbrain models for speaker embeddings work at 16khz. it's not ideal, but a good workaround. I think you can hardcode 16khz in the code, too and remove it from the params. we want to remove any chance for the user to make silly mistakes. thanks!! and nice catch!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

audios: List[Tuple[Audio, Audio]],
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio
device: DeviceType = DeviceType.CPU,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please make device's default always as None and use our _select_device_and_dtype method? this way, if the user doesn't have any preference and a GPU is available, it will be used. in your code, you force the user to use a CPU no matter what

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With MPS, was getting:

ValueError: The requested DeviceType is either not available or                             compatible with this functionality.

src/senselab/utils/data_structures/device.py:60: ValueError

I excluded it from compatible_devices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU. am i missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you see it now?


Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any
specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate)
def resample_audios(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks you @ibevers . this is helpful. i did some further research and i think we should go with an alternative implementation that is not yours or mine, but that from transforms.Resample. transforms.resample is not that different from functionals.resample, but it precomputes and reuses the resampling kernel, so using it will result in more efficient computation if resampling multiple waveforms with the same resampling parameters. they both internally do the butterworth filtering for anti-aliasing - which is why your method and my method are redundant in the same wrapping function - and then resample the signal. we can pass order and lowcut as param and compute roll off by ourselves. I would appreciate if you could refactor the code.
reference: https://pytorch.org/audio/main/generated/torchaudio.transforms.Resample.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an fyi that transforms.resample does not use butterworth filtering. it uses sinc interpolation with a hamming or kaiser window. in at least my initial antiliasing tests with fixed sinuisoids it was not great at creating a good filter. it still passed some amount of the signal through. i can't find the notebook right this minute, but the general idea of the test is:

create sinusoid at 14 KHz sampled at 48K, then filter down to samping rate of 16K. you should not see on an FFT any signal peak at 2K (the aliased signal). if you do, that means the anti-aliasing filter is not doing a good job. anything that far from nyquist (8K) should be completely filtered out.

hence it may make sense to have multiple resamplers still.

Copy link
Collaborator Author

@ibevers ibevers Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hash this out in #90 and leave the resampling the way it is for this PR? @fabiocat93 @satra

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@fabiocat93
Copy link
Collaborator

@fabiocat93 I incorporated your feedback. Hopefully, this is mergable now:)

Almost! but we are very close... i feel so sorry!!

@ibevers
Copy link
Collaborator Author

ibevers commented Jul 11, 2024

@fabiocat93 I made some changes and responded to some of your feedback

Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% done 👍

audios: List[Tuple[Audio, Audio]],
model: SpeechBrainModel = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main"),
model_training_sample_rate: int = 16000, # spkrec-ecapa-voxceleb trained on 16kHz audio
device: DeviceType = DeviceType.CPU,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. speechbrain models don't support mps yet. excluding it from compatible_devices is the way to go. idk why i cannot see your last changes though and keep seeing device: DeviceType = DeviceType.CPU. am i missing something?


Takes a list of audios and resamples each into the new sampling rate. Notably does not assume any
specific structure of the audios (can vary in stereo vs. mono as well as their original sampling rate)
def resample_audios(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. i think that at this point we could just remove the torchaudio implementation and use yours from b2aiprep @ibevers . this will solve 2 issues at once

@ibevers
Copy link
Collaborator Author

ibevers commented Jul 11, 2024

@fabiocat93 I made another round of updates. Is it mergable now?

@fabiocat93 fabiocat93 changed the base branch from main to release_060 July 11, 2024 21:52
@fabiocat93 fabiocat93 removed release minor Minor release labels Jul 11, 2024
@fabiocat93 fabiocat93 merged commit b26be06 into release_060 Jul 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task: add b2ai speaker verification functions
4 participants