-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading and writing bitpacked activations in C++ kernels. #305
Conversation
This feature is disabled by default and so does not change any behaviour for new or existing converted models; enabling this feature will require changes to the converter to set the correct op flags. Currently, binary convolutions expect full precision input, bitpack that input, perform the computation, and then write full-precision output. This feature allows a binary convolutions to instead perform bitpacking inside the kernel and write (8-bit) bitpacked output, and for the subsequent binary convolution to skip the usual initial bitpacking step and read the already-bitpacked input directly. Doing this significantly reduces the number of reads and writes to memory and the the overall memory footprint of the model inference. Support is added only for the C++ kernels (x86 and Arm32). Support for the optimised assembly kernels (Arm64) requires additional future work.
Thank you very much for reviewing @Tombana, and your suggestions. I've now merged in changes from master and the PR is now extended to work with the reference kernel as well (which took far, far, far longer than I expected it would). A few remarks:
|
8525864
to
0f93a10
Compare
It should indeed not be a problem. In the long-term however, I think we have to try to get the 32- (or even 64-)bit RUY version working (it already does, but simply not multithreaded), because it will be faster because of memory-alignment. I think we can do that in a separate PR, later.
I've ran into the same problems, its very annoying that you can't put
Thanks! |
Co-Authored-By: Adam Hillier <[email protected]>
Co-Authored-By: Adam Hillier <[email protected]>
GitHub won't let me approve my own PR, even if a substantial part of it was written by someone else, but thank you very much @Tombana for all of this 🎉 |
This 'test' is now split in two parts: 1) an actual test that takes an example Keras model and runs it through the 'generate_ikva_network' function. 2) a debug utility that can be ran as a stand-alone tool, now including a proper argument parser. And also: * Move Ikva end2end tests to its own target * Add new Ikva cocotb smoke test to CI
What do these changes do?
This feature is disabled by default and so does not change any behaviour for new or existing converted models; enabling this feature will require changes to the converter to set the correct op flags.
Currently, binary convolutions expect full precision input, bitpack that input, perform the computation, and then write full-precision output. This feature allows a binary convolutions to instead perform bitpacking inside the kernel and write (8-bit or 32-bit) bitpacked output, and for the subsequent binary convolution to skip the usual initial bitpacking step and read the already-bitpacked input directly. Doing this significantly reduces the number of reads and writes to memory and the the overall memory footprint of the model inference.
Support is added only for the C++ kernels (x86 and Arm32), including the reference kernel. Support for the optimised assembly kernels (Arm64) requires additional future work.
How have these changes been tested?
The existing
bconv2d_test.cc
tests have been extended to support testing bitpacked input/output tensors with the Ruy c++ kernel (8-bit bitpacking) and the reference kernel (32-bit bitpacking).