#8246: add bs>1 support for funtional_whisper conditional generation

tenstorrent · May 13, 2024 · 4c8b951 · 4c8b951
1 parent d93744b
commit 4c8b951
Show file tree

Hide file tree

Showing 32 changed files with 309 additions and 180 deletions.
diff --git a/models/demos/grayskull/functional_whisper/README.md b/models/demos/grayskull/functional_whisper/README.md
@@ -0,0 +1,75 @@
+# Functional Whisper Model Demos For Audio Classification and Text Generation
+
+## Introduction
+
+Whisper is a pre-trained model for Automatic Speech Recognition (ASR) and Speech Translation. The models are trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on both Speech Recognition and Speech Translation tasks.
+
+The demos showcases Functional Whisper Model for Audio Classification and Text Generation tasks,
+`sanchit-gandhi/whisper-medium-fleurs-lang-id` and `openai/whisper-tiny.en` versions from Hugging Face are utilized for respective tasks.
+
+### Details
+
+The entry point to the Functional Whisper model is the `whisper` function located in `ttnn_optimized_functional_whisper.py`.
+
+## Inputs
+
+Inputs by default are provided from `dataset/audio_classification` and `dataset/conditional_generation` folder. To modify the inputs or specify a different path, adjust the input_path parameter in the command accordingly. It's recommended to avoid direct modifications to the input_data.json file.
+
+
+For the demos with datasets, Inputs for Audio classification are taken from `google/fleurs` dataset and Inputs for Conditional generation are taken from `hf-internal-testing/librispeech_asr_dummy` dataset.
+
+## Batch size: 8
+
+Batch Size determines the number of input sequences processed simultaneously during training or inference, impacting computational efficiency and memory usage. It is recommended to set the `batch_size` to 8.
+
+## How to run demo for Audio Classification task
+
+To run the demo for audio classification using the Whisper model, follow these instructions:
+
+- Use the following command to run the whisper for audio classification demo with ttnn optimized functional whisper:
+  ```
+  `pytest --disable-warnings --input-path="models/demos/grayskull/functional_whisper/demo/dataset/audio_classification" models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_audio_classification[8-models.demos.grayskull.functional_whisper.tt.ttnn_optimized_functional_whisper]`
+  ```
+
+- to run the whisper for audio classification demo with ttnn functional whisper use the following command:
+  ```
+  pytest --disable-warnings --input-path="models/demos/grayskull/functional_whisper/demo/dataset/audio_classification" models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_audio_classification[8-8-models.demos.grayskull.functional_whisper.tt.ttnn_functional_whisper]
+  ```
+
+- our another demo is designed to run with `google/fleurs` dataset for Audio classification, to run the demo for dataset use the command:
+  ```
+  pytest --disable-warnings models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_audio_classification_dataset
+  ```
+
+## How to run demo for Text Generation task
+To run the demo for text generation using the Whisper model, follow these instructions:
+
+- Use the following command to run the whisper for text generation demo with ttnn optimized functional whisper:
+  ```
+  pytest --disable-warnings --input-path="models/demos/grayskull/functional_whisper/demo/dataset/conditional_generation" models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_conditional_generation[8-32-models.demos.grayskull.functional_whisper.tt.ttnn_optimized_functional_whisper]
+  ```
+
+- Use the following command to run the whisper for text generation demo with ttnn functional whisper:
+  ```
+  pytest --disable-warnings --input-path="models/demos/grayskull/functional_whisper/demo/dataset/conditional_generation" models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_conditional_generation[8-32-models.demos.grayskull.functional_whisper.tt.ttnn_functional_whisper]
+  ```
+
+- Our second demo is designed to run with `hf-internal-testing/librispeech_asr_dummy` dataset for text generation.
+
+- To run the second demo using ttnn optimized functional whisper with dataset inputs for 1 iteration(s), each configured with a batch size of 8 and decoding up to 32 tokens, use the following command :
+  ```
+  pytest --disable-warnings models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_conditional_generation_dataset[8-1-64-models.demos.grayskull.functional_whisper.tt.ttnn_optimized_functional_whisper]
+  ```
+- To run the second demo using ttnn functional whisper with dataset inputs for 1 iteration(s), each configured with a batch size of 8 and decoding up to 32 tokens, use the following command:
+  ```
+  pytest --disable-warnings models/demos/grayskull/functional_whisper/demo/demo.py::test_demo_for_conditional_generation_dataset[8-1-32-models.demos.grayskull.functional_whisper.tt.ttnn_functional_whisper]
+  ```
+
+## Results
+
+The demos presents a comprehensive view of the Whisper model's robustness in audio classification and text generation tasks.
+
+Audio classification predicts the languange of the provided audio sample and the demo using dataset inputs provides the accuracy of the model.
+For example, accuracy of 0.75 is observed with `batch_size=8` and `n_iterations=3`
+
+In Text generation, the model predicts transcriptions in the same language as the audio (English).
diff --git a/...o_classification/10116516891483200485.wav → ...o_classification/10116516891483200485.wav b/...o_classification/10116516891483200485.wav → ...o_classification/10116516891483200485.wav
diff --git a/...dio_classification/140291826269534354.wav → ...dio_classification/140291826269534354.wav b/...dio_classification/140291826269534354.wav → ...dio_classification/140291826269534354.wav
diff --git a/...io_classification/1689242038473278354.wav → ...io_classification/1689242038473278354.wav b/...io_classification/1689242038473278354.wav → ...io_classification/1689242038473278354.wav
diff --git a/...o_classification/17340315164505628698.wav → ...o_classification/17340315164505628698.wav b/...o_classification/17340315164505628698.wav → ...o_classification/17340315164505628698.wav
diff --git a/...o_classification/17659141715436566244.wav → ...o_classification/17659141715436566244.wav b/...o_classification/17659141715436566244.wav → ...o_classification/17659141715436566244.wav
diff --git a/...o_classification/17928171511082320095.wav → ...o_classification/17928171511082320095.wav b/...o_classification/17928171511082320095.wav → ...o_classification/17928171511082320095.wav
diff --git a/...io_classification/2086639904747050008.wav → ...io_classification/2086639904747050008.wav b/...io_classification/2086639904747050008.wav → ...io_classification/2086639904747050008.wav
diff --git a/...dio_classification/622196158886216764.wav → ...dio_classification/622196158886216764.wav b/...dio_classification/622196158886216764.wav → ...dio_classification/622196158886216764.wav
diff --git a/...io_classification/7043619860143829064.wav → ...io_classification/7043619860143829064.wav b/...io_classification/7043619860143829064.wav → ...io_classification/7043619860143829064.wav
diff --git a/...io_classification/9522084197299278725.wav → ...io_classification/9522084197299278725.wav b/...io_classification/9522084197299278725.wav → ...io_classification/9522084197299278725.wav
diff --git a/...ional_generation/11150113890463037787.wav → ...ional_generation/11150113890463037787.wav b/...ional_generation/11150113890463037787.wav → ...ional_generation/11150113890463037787.wav
diff --git a/...tional_generation/1298409023920250606.wav → ...tional_generation/1298409023920250606.wav b/...tional_generation/1298409023920250606.wav → ...tional_generation/1298409023920250606.wav
diff --git a/...ional_generation/17566024285835266239.wav → ...ional_generation/17566024285835266239.wav b/...ional_generation/17566024285835266239.wav → ...ional_generation/17566024285835266239.wav
diff --git a/...ional_generation/17646385371758249908.wav → ...ional_generation/17646385371758249908.wav b/...ional_generation/17646385371758249908.wav → ...ional_generation/17646385371758249908.wav
diff --git a/...ional_generation/17659141715436566244.wav → ...ional_generation/17659141715436566244.wav b/...ional_generation/17659141715436566244.wav → ...ional_generation/17659141715436566244.wav
diff --git a/...ional_generation/17928171511082320095.wav → ...ional_generation/17928171511082320095.wav b/...ional_generation/17928171511082320095.wav → ...ional_generation/17928171511082320095.wav
diff --git a/...ional_generation/17938133003986293739.wav → ...ional_generation/17938133003986293739.wav b/...ional_generation/17938133003986293739.wav → ...ional_generation/17938133003986293739.wav
diff --git a/...tional_generation/2842775607363710885.wav → ...tional_generation/2842775607363710885.wav b/...tional_generation/2842775607363710885.wav → ...tional_generation/2842775607363710885.wav
diff --git a/...tional_generation/6757317816154782558.wav → ...tional_generation/6757317816154782558.wav b/...tional_generation/6757317816154782558.wav → ...tional_generation/6757317816154782558.wav
diff --git a/...tional_generation/6969469525741631060.wav → ...tional_generation/6969469525741631060.wav b/...tional_generation/6969469525741631060.wav → ...tional_generation/6969469525741631060.wav