Add implementation of WebGPU EP #22591

fs-eire · 2024-10-24T20:20:09Z

Description

This PR adds the actual implementation of the WebGPU EP based on #22318.

This change includes the following:

core framework of WebGPU EP

WebGPU EP factory classes for:
- handling WebGPU options
- creating WebGPU EP instance
- creating WebGPU context
WebGPU Execution Provider classes
- GPU Buffer allocator
- data transfer
Buffer management classes
- Buffer Manager
- BufferCacheManager
  - DisabledCacheManager
  - SimpleCacheManager
  - LazyReleaseCacheManager
  - BucketCacheManager
Program classes
- Program (base)
- Program Cache Key
- Program Manager
Shader helper classes
- Shader Helper
- ShaderIndicesHelper
- ShaderVariableHelper
Utils
- GPU Query based profiler
- compute context
- string utils
Miscs
- Python binding webgpu support (basic)

Kernel implementation

onnx.ai (default opset):
- Elementwise (math): Abs, Neg, Floor, Ceil, Reciprocal, Sqrt, Exp, Erf, Log, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Tanh, Not, Cast
- Elementwise (activation): Sigmoid, HardSigmoid, Clip, Elu, Relu, LeakyRelu, ThresholdedRelu, Gelu
- Binary (math): Add, Sub, Mul, Div, Pow, Equal, Greater, GreaterOrEqual, Less, LessOrEqual
- (Tensors): Shape, Reshape, Squeeze, Unsqueeze
- Where
- Transpose
- Concat
- Expand
- Gather
- Tile
- Range
- LayerNormalization
com.microsoft
- FastGelu
- MatMulNBits
- MultiHeadAttention
- RotaryEmbedding
- SkipLayerNormalization
- LayerNormalization
- SimplifiedLayerNormalization
- SkipSimplifiedLayerNormalization

Build, test and CI pipeline integration

build works for Windows, macOS and iOS
support onnxruntime_test_all and python node test
added a new unit test for --use_external_dawn build flag.
updated MacOS pipeline to build with WebGPU support
added a new pipeline for WebGPU Windows

This change does not include:

Node.js binding support for WebGPU (will be a separate PR)

guschmue

Lgtm

cmake/CMakeLists.txt

cmake/external/onnxruntime_external_deps.cmake

onnxruntime/contrib_ops/webgpu/bert/multihead_attention.cc

onnxruntime/core/providers/webgpu/webgpu_context.cc

onnxruntime/core/providers/webgpu/webgpu_context.h

onnxruntime/core/providers/webgpu/webgpu_context.cc

onnxruntime/core/providers/webgpu/shader_variable.cc

onnxruntime/core/providers/webgpu/tensor/where.cc

onnxruntime/core/providers/webgpu/shader_variable.h

onnxruntime/core/providers/webgpu/buffer_manager.cc

onnxruntime/core/providers/webgpu/tensor/cast.cc

onnxruntime/core/providers/webgpu/tensor/concat.cc

onnxruntime/core/providers/webgpu/tensor/gather.cc

onnxruntime/contrib_ops/webgpu/bert/multihead_attention.cc

onnxruntime/contrib_ops/webgpu/quantization/matmul_nbits.cc

onnxruntime/core/providers/webgpu/generator/range.cc

onnxruntime/core/providers/webgpu/tensor/where.cc

onnxruntime/core/providers/webgpu/webgpu_context.cc

onnxruntime/core/providers/webgpu/webgpu_context.h

onnxruntime/core/providers/webgpu/webgpu_context.cc

+                    ", Actual: ", shape.NumDimensions());
+
+      std::vector<uint32_t> dims(expected_rank);
+      std::vector<uint32_t> stride(expected_rank - 1);


onnxruntime/core/providers/webgpu/webgpu_context.cc

+      std::vector<uint32_t> stride(expected_rank - 1);
+      for (size_t j = 0; j < expected_rank; ++j) {
+        dims[j] = gsl::narrow<uint32_t>(shape[j]);
+        if (j < expected_rank - 1) {


Introducing WebGPU EP

cbb050d

fs-eire requested a review from a team as a code owner October 24, 2024 20:20

snnn closed this Oct 25, 2024

snnn reopened this Oct 25, 2024

guschmue previously approved these changes Oct 25, 2024

View reviewed changes