Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Normalization Functions #95

Open
2 of 5 tasks
avik-pal opened this issue Jul 20, 2024 · 0 comments
Open
2 of 5 tasks

Improvements to Normalization Functions #95

avik-pal opened this issue Jul 20, 2024 · 0 comments

Comments

@avik-pal
Copy link
Member

avik-pal commented Jul 20, 2024

Current Status

Native Julia Implementations

  • groupnorm has custom kernels for the forward and backward pass. This is achieved by restructuring all the arrays into 4D and constructing either loops or KA kernels. This gives us around 2-3x performance boost over standard cases (verified against Flux as well).
  • instancenorm -- very similar to groupnorm.
  • layernorm
    • We should change the default to what Pytorch does. This case is very simple to optimize using LoopVectorization on CPU and KernelAbstractions on GPU.
    • The general broadcasting case is very hard to optimize, best we could do is fuse into a single GPU kernel but this is not worth it much.

Integration into vendor libraries

  • MIOpen for AMDGPU batchnorm. With the new batchnorm kernels, we should test whether this is even worth it. Though I would need someone who has access to ROCm capable GPUs to test this.
    • Seems like the kernels are quite performant, and we are close to cuDNN in performance even with the naive way the kernels are written.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant