Skip to content

LuluW8071/Whisper-Tune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisper Tune

Code in Progress License Open Issues Closed Issues Open PRs Repo Size Last Commit

Fine-tune OpenAI’s Whisper model for automatic speech recognition (ASR) on custom datasets. This script supports flexible parameterization, model saving, and experiment tracking.

Whisper

Requirements

To install the required dependencies, you can use the following command:

pip install -r requirements.txt

Environment Variables

Ensure you have a .env file in the project root that contains your Comet ML API key for logging:

COMET_API_KEY = "your_comet_api_key"

The model training logs will be pushed to Comet ML for tracking the experiments.

Usage

Collect your own dataset

You can use the Mimic Recording Studio to collect your own dataset.

1. Downsample

Downsample the audio files to 16Khz sample rate and change format to FLAC.

python downsample.py \
    --input_file <mimic-audio/backend/path/to/transcript.txt> \ 
    --output_dir <output/directory> \
    --percent 20

2. Merge

Merge train and test JSON files into a single file.

python merge.py \
    <path/to/train_1.json> <path/to/train_2.json> <path/to/train_3.json> \
    --output merged_train.json
Argument Description Default Value
--train_json Path to the training dataset in JSON format. N/A
--test_json Path to the test dataset in JSON format. N/A
--whisper_model, -model Choose from tiny, base, small, medium, large, large-v2, large-v3, large-v3-turbo, or provide a custom Whisper model name. base
--batch_size The batch size for training and evaluation. 16
--gradient_accumulation_steps, -grad_steps Number of gradient accumulation steps. 1
--learning_rate, -lr Learning rate for training. 2e-5
--warmup_steps Number of warmup steps for the learning rate scheduler. 500
--epochs, -e Number of epochs to train for. 10
--num_workers, -w Number of CPU workers. 2
python train.py \
    --train_json merged_train.json \
    --test_json merged_test.json \
    --whisper_model tiny \
    --batch_size 8 \
    --grad_steps 1 \
    --lr 1e-4 \
    --warmup_steps 75 \
    --epochs 10
    -w 2

Results & Tracking

Training logs, loss curves, and WER can be tracked on Comet ML and TensorBoard.

Model Name Parameters Eval Loss WER Epochs Batch Size Learning Rate Link
Whisper Tiny 39 M 0.3751 0.1311 10 4 1e-4 🤗
Whisper Base 74 M 0.2331 0.0992 10 16 2e-05 🤗
Whisper Small 224 M 0.1889 0.0811 10 16 2e-05 🤗
Whisper Medium 769 M 0.1404 0.0645 5 8 2e-05 🤗

WER

Pushing to Hugging Face Hub 🤗

The script is designed to automatically push the best trained model to the Hugging Face Hub. Make sure you have set up your Hugging Face credentials properly.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Finetuning Whisper on your own voice

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published