Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

Open
StephennFernandes opened this issue Apr 8, 2024 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@StephennFernandes
Copy link

Hey @shashikg great repo and cheers to the insane efforts in building this repo.

I have a finetuned whisper model (both in original openai and HF formats ) which I want to use in TensorRT backend using WhisperS2T. While I figured out how to load official whisper models, i was wondering how could i convert whisper models to TensorRT and load them using WhisperS2T

@StephennFernandes StephennFernandes changed the title how to convert the official whisper or HF whisper model to TensorRT based backend ? how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? Apr 8, 2024
@StephennFernandes
Copy link
Author

for some reference i did refer the TensorRT-LLM repo's whisper example but upon loading the model from the path like this:

model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM')

i get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/__init__.py", line 44, in load_model
    return WhisperModel(model_identifier, **model_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 81, in __init__
    trt_build_args = load_trt_build_config(self.model_path)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/engine_builder/__init__.py", line 77, in load_trt_build_config
    with open(f'{output_dir}/trt_build_args.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/app/TRT_whisper/whisper_large_v3/trt_build_args.json'

@aleksandr-smechov
Copy link

aleksandr-smechov commented Apr 9, 2024

The WhisperS2T code generates a trt_build_args JSON here, so you'll need to generate that if you're using the official example. Also note that while you can use TensorRT-LLM's example, you need to change the tensorrt_llm version in the requirements.txt file here to whatever WhisperS2T's using.

@StephennFernandes
Copy link
Author

@aleksandr-smechov hey thanks for replying, could you please let me know how do i generate the trt_build_args JSON. i had a finetuned whisper model from HF converted back to openai format. which then i use the tensorRT_llm converters built script.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attention_plugin --enable_context_fmha

which just gives me the whisper model in TensorRT_LLM format.

is there a way in WhisperS2T code that i can generate the trt_build_args Json file.

@aleksandr-smechov
Copy link

aleksandr-smechov commented Apr 9, 2024

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

@StephennFernandes
Copy link
Author

@aleksandr-smechov , thanks a ton for your help. i would try this out you said.
just one clarification, when you say, "I completely removed that requirement from WhisperS2T code personally" do you mean you have a personal repo/fork of WhisperS2T that doesn't have these requirements. if yes, could you please link it.

@aleksandr-smechov
Copy link

Sure, you can compare the WhisperModelTRT implementation in WhisperS2T here to the implementation here.

@StephennFernandes
Copy link
Author

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

hey i did as you said, i used the trt_model_args.json file and placed it into my TRT_model dir, as well as replaced the .engine files to encoder.engine and decoder.engine.

the problem is i get the following error:

TypeError: ModelConfig.__init__() missing 2 required positional arguments: 'max_batch_size' and 'max_beam_width'

despite the max_batch_size and max_beam_width existing in the trt_model_args.json even with me explicitly setting this args the issue still persists.

the following is the code of explicitly setting the args:

model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM', max_batch_size=24,max_beam_width=1)

the following is the trt_model_args.json file:

{"max_batch_size": 24, "max_beam_width": 1, "max_input_len": 4, "max_output_len": 448, "world_size": 1, "dtype": "float16", "quantize_dir": "quantize/1-gpu", "use_gpt_attention_plugin": "float16", "use_bert_attention_plugin": null, "use_context_fmha_enc": false, "use_context_fmha_dec": false, "use_gemm_plugin": "float16", "use_layernorm_plugin": false, "remove_input_padding": false, "use_weight_only_enc": false, "use_weight_only_dec": false, "weight_only_precision": "int8", "int8_kv_cache": false, "debug_mode": false, "cuda_compute_capability": [8, 6], "output_dir": "/root/.cache/whisper_s2t/models/trt/large-v3/c55664fdf5b447062c4cd7a0b64b72fc", "model_path": "/root/.cache/whisper_s2t/models/trt/large-v3/pt_ckpt.pt"}

@StephennFernandes
Copy link
Author

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

hey i did as you said, i used the trt_model_args.json file and placed it into my TRT_model dir, as well as replaced the .engine files to encoder.engine and decoder.engine.

the problem is i get the following error:

TypeError: ModelConfig.__init__() missing 2 required positional arguments: 'max_batch_size' and 'max_beam_width'

despite the max_batch_size and max_beam_width existing in the trt_model_args.json even with me explicitly setting this args the issue still persists.

the following is the code of explicitly setting the args:

model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM', max_batch_size=24,max_beam_width=1)

the following is the trt_model_args.json file:

{"max_batch_size": 24, "max_beam_width": 1, "max_input_len": 4, "max_output_len": 448, "world_size": 1, "dtype": "float16", "quantize_dir": "quantize/1-gpu", "use_gpt_attention_plugin": "float16", "use_bert_attention_plugin": null, "use_context_fmha_enc": false, "use_context_fmha_dec": false, "use_gemm_plugin": "float16", "use_layernorm_plugin": false, "remove_input_padding": false, "use_weight_only_enc": false, "use_weight_only_dec": false, "weight_only_precision": "int8", "int8_kv_cache": false, "debug_mode": false, "cuda_compute_capability": [8, 6], "output_dir": "/root/.cache/whisper_s2t/models/trt/large-v3/c55664fdf5b447062c4cd7a0b64b72fc", "model_path": "/root/.cache/whisper_s2t/models/trt/large-v3/pt_ckpt.pt"}

@aleksandr-smechov what seems possibly wrong that the following error is triggered ? despite me explicitly even adding the args and the args being present in the json file.

@aleksandr-smechov
Copy link

aleksandr-smechov commented Apr 12, 2024

@StephennFernandes I believe I encountered the same issue before and overcame it by adding these args here.

@StephennFernandes
Copy link
Author

@aleksandr-smechov thanks for the heads up,it really means a lot, i was able to fix this issue. by refactoring in 2 places.
(let me know in case i could submit a PR for this, on how to port a custom OAI model )

but the model is stuck and hang up with a new issue.

1. editing the decoder_model_config in the model.py and explicitly adding the 2 args: max_batch_size and max_beam_width

            num_heads=self.decoder_config['num_heads'],
            num_kv_heads=self.decoder_config['num_heads'],
            hidden_size=self.decoder_config['hidden_size'],
            vocab_size=self.decoder_config['vocab_size'],
            num_layers=self.decoder_config['num_layers'],
            gpt_attention_plugin=self.decoder_config['gpt_attention_plugin'],
            remove_input_padding=self.decoder_config['remove_input_padding'],
            cross_attention=self.decoder_config['cross_attention'],
            has_position_embedding=self.decoder_config['has_position_embedding'],
            has_token_type_embedding=self.decoder_config['has_token_type_embedding'],
            max_batch_size=self.decoder_config["max_batch_size"],
            max_beam_width=self.decoder_config["max_beam_width"],
        )

2. i had to pull the tokenizer.json file from HF transformers into the dir where my tensorrt_llm model file was saved.

post editing all this now the model is stuck / hangup and following are the terminal logs.

Transcribing:   0%|                                                                                                              | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/app/TRT_whisper/inference.py", line 8, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 789, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2993, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2642, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2334, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!
double free or corruption (out)
[user-DSA7TGX-424R:46643] *** Process received signal ***
[user-DSA7TGX-424R:46643] Signal: Aborted (6)
[user-DSA7TGX-424R:46643] Signal code:  (-6)
[user-DSA7TGX-424R:46643] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x70d5e560d520]
[user-DSA7TGX-424R:46643] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x70d5e56619fc]
[user-DSA7TGX-424R:46643] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x70d5e560d476]
[user-DSA7TGX-424R:46643] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x70d5e55f37f3]
[user-DSA7TGX-424R:46643] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x70d5e5654676]
[user-DSA7TGX-424R:46643] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x70d5e566bcfc]
[user-DSA7TGX-424R:46643] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa2e70)[0x70d5e566de70]
[user-DSA7TGX-424R:46643] [ 7] /lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x70d5e5670453]
[user-DSA7TGX-424R:46643] [ 8] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x1a43d22)[0x70d5ccf96d22]
[user-DSA7TGX-424R:46643] [ 9] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x1a4dd54)[0x70d5ccfa0d54]
[user-DSA7TGX-424R:46643] [10] /lib/x86_64-linux-gnu/libc.so.6(+0x45495)[0x70d5e5610495]
[user-DSA7TGX-424R:46643] [11] /lib/x86_64-linux-gnu/libc.so.6(on_exit+0x0)[0x70d5e5610610]
[user-DSA7TGX-424R:46643] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d97)[0x70d5e55f4d97]
[user-DSA7TGX-424R:46643] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x70d5e55f4e40]
[user-DSA7TGX-424R:46643] [14] python(_start+0x25)[0x559e80281f25]
[user-DSA7TGX-424R:46643] *** End of error message ***

for some additional context i am running all of this on NVIDIA A6000, and i am using the TensorRT version 9.2.0.5.

@StephennFernandes
Copy link
Author

@aleksandr-smechov @shashikg could this be a version mismatch?

as i have built the whisper model to TensorRT using the TensorRT version 9.2.0.5 and whisperS2T expects its own TRT version.

i tried building my whisper model on the WhisperS2T official docker image, but i get the following error when builing whisper to TRT format.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attentio
n_plugin --enable_context_fmha
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last):
 File "/app/TensorRT-LLM/examples/whisper/build.py", line 27, in <module>
   from tensorrt_llm.models.modeling_utils import QuantConfig
ImportError: cannot import name 'QuantConfig' from 'tensorrt_llm.models.modeling_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py)

@aleksandr-smechov
Copy link

@StephennFernandes that's correct, you'd need to build the TRT model using the same version of TensorRT-LLM as WhisperS2T uses.

@StephennFernandes
Copy link
Author

StephennFernandes commented Apr 14, 2024

@aleksandr-smechov

I tried, but unable to build.

i am facing the following error.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attentio n_plugin --enable_context_fmha [TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last): File "/app/TensorRT-LLM/examples/whisper/build.py", line 27, in <module> from tensorrt_llm.models.modeling_utils import QuantConfig ImportError: cannot import name 'QuantConfig' from 'tensorrt_llm.models.modeling_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py)

does WhisperS2T have a different build structure/format ?

I mean to say a totally different build script ? The official build script for whisper, from tensorRT_LLM's repo doesn't work. the following error is from that build script.

@StephennFernandes
Copy link
Author

StephennFernandes commented Apr 15, 2024

@aleksandr-smechov
UPDATE:
i found the following conversion script inside the engine_builder's init function link
python3 -m whisper_s2t.backends.tensorrt.engine_builder.builder --output_dir=./model_export_path --log_level=error

so i made a dir, placed the .pt model file, hf tokenizer and the trt_build_args.json file into the dir (edited the trt_build_args.json files, output_dir and model_path paths to the current output dir) and launched the script as above.

but now the inference code still crashes with a new error.

  File "/workspace/whispers2t_inference.py", line 10, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 755, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2891, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

now i have even built the model on the same official whisperS2T docker image. so it doesn't seem like a TRT versioning issue.

the following is the entire stack track of the error:

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Transcribing:   0%|                                                                                                                                | 0/100 [00:00<?, ?it/s][04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
Transcribing:   0%|                                                                                                                                | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/workspace/whispers2t_inference.py", line 12, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 755, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2891, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

i tried to move the TRT model files to the cache dir where "whisper-v3" is saved internally. upon replacing and running the code as if i would be running the regular model, the inference works. but adding the path doesn't

@nicolas-docto
Copy link

Hi, also very interested on how to integrate a custom finetuned Whisper to whisper_s2t in TensorRT-LLM. Thanks a lot.
I have been updating a forked of this amazing repo, but still struggling to integrate a fine-tuned whisper (in HF format)

@shashikg shashikg self-assigned this Apr 16, 2024
@shashikg shashikg added the enhancement New feature or request label Apr 16, 2024
@StephennFernandes
Copy link
Author

@aleksandr-smechov @shashikg

i tried to move the TRT model files to the cache dir where "whisper-v3" is saved internally.

upon replacing and running the code as if i would be running the regular model, the inference works. but adding the path doesn't

i cannot clearly get it, what could be the issue here ... seems like the issue only gets triggered when the model is called from a path

@aleksandr-smechov
Copy link

Hi @StephennFernandes awesome to hear that it's working for you. As you mentioned, it might be a path issue. I did some major refactoring for my library so it didn't come up as an issue.

@eschmidbauer
Copy link

is it possible to update the TensorRT version to support newer models?
example https://huggingface.co/yuekai/whisper_large_v3_trtllm_triton

@eschmidbauer
Copy link

running into this same issue trying to convert whisper-v3-turbo

@StephennFernandes
Copy link
Author

@eschmidbauer any luck trying to convert v3-turbo ?

@eschmidbauer
Copy link

no. the TensorRT-LLM support in WhisperS2T is a bit out of date. Last i checked the latest version was 0.14
This is pretty useful example of using latest TensorRT-LLM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants