Support reading and writing bitpacked activations in C++ kernels. #305

AdamHillier · 2020-03-25T12:29:24Z

What do these changes do?

This feature is disabled by default and so does not change any behaviour for new or existing converted models; enabling this feature will require changes to the converter to set the correct op flags.

Currently, binary convolutions expect full precision input, bitpack that input, perform the computation, and then write full-precision output. This feature allows a binary convolutions to instead perform bitpacking inside the kernel and write (8-bit or 32-bit) bitpacked output, and for the subsequent binary convolution to skip the usual initial bitpacking step and read the already-bitpacked input directly. Doing this significantly reduces the number of reads and writes to memory and the the overall memory footprint of the model inference.

Support is added only for the C++ kernels (x86 and Arm32), including the reference kernel. Support for the optimised assembly kernels (Arm64) requires additional future work.

How have these changes been tested?

The existing bconv2d_test.cc tests have been extended to support testing bitpacked input/output tensors with the Ruy c++ kernel (8-bit bitpacking) and the reference kernel (32-bit bitpacking).

This feature is disabled by default and so does not change any behaviour for new or existing converted models; enabling this feature will require changes to the converter to set the correct op flags. Currently, binary convolutions expect full precision input, bitpack that input, perform the computation, and then write full-precision output. This feature allows a binary convolutions to instead perform bitpacking inside the kernel and write (8-bit) bitpacked output, and for the subsequent binary convolution to skip the usual initial bitpacking step and read the already-bitpacked input directly. Doing this significantly reduces the number of reads and writes to memory and the the overall memory footprint of the model inference. Support is added only for the C++ kernels (x86 and Arm32). Support for the optimised assembly kernels (Arm64) requires additional future work.

larq_compute_engine/tflite/kernels/bconv2d.cc

larq_compute_engine/tflite/tests/BUILD

…d-x86

larq_compute_engine/tflite/kernels/bconv2d.cc

larq_compute_engine/tflite/kernels/bconv2d_impl.h

larq_compute_engine/tflite/kernels/bconv2d.cc

larq_compute_engine/tflite/kernels/bconv2d_impl.h

larq_compute_engine/tflite/tests/bconv2d_test.cc

…nel.

This reverts commit 51b14d1.

AdamHillier · 2020-04-01T23:41:41Z

Thank you very much for reviewing @Tombana, and your suggestions. I've now merged in changes from master and the PR is now extended to work with the reference kernel as well (which took far, far, far longer than I expected it would).

A few remarks:

In the reference kernel 32-bit bitpacking is used; in the C++ Ruy kernel 8-bit bitpacking is used. This now means that reference and optimised ops cannot be used simultaneously in the same interpreter, because they won't interact correctly when using bitpacked activations. This shouldn't ever actually be a problem because mixing the two wouldn't make sense, but it's not a restriction that we actually enforce anywhere and technically did work previously.
One problem I encountered with 32-bit bitpacking in the reference kernel was being forced to use the kTFLiteInt32 type for bitpacked tensors, as TFL doesn't have an unsigned 32 bit integer type. Unfortunately, this conflicted with using std::uint8_t and std::uint32_t as destination scalars throughout my implementation to template the bitpacked versions of the kernels. TFLite didn't like me populating a kTFLiteInt32 tensor with std::uint32_t data, or extracting std::uint32_t data from the tensor. Rather than searching for a cumbersome workaround, I elected to change all std::uint8_t and std::uint32_t usage as destination scalars to the signed equivalents std::uint8 and std::uint32.

This means there is now a bit of a mis-match between TBitpacked and DstScalar, because in the former we still used the unsigned types (although I don't think there's a technical restriction given that the bitpacking functions use std::make_unsigned, and so would also work with the signed scalars).

How do you feel about switching to the signed bitpacking types everywhere?
It will probably be easiest to review the new changes commit-by-commit.

larq_compute_engine/tflite/tests/bconv2d_test.cc

larq_compute_engine/tflite/kernels/bconv2d.cc

Tombana · 2020-04-03T08:09:51Z

In the reference kernel 32-bit bitpacking is used; in the C++ Ruy kernel 8-bit bitpacking is used. This now means that reference and optimised ops cannot be used simultaneously in the same interpreter, because they won't interact correctly when using bitpacked activations. This shouldn't ever actually be a problem because mixing the two wouldn't make sense, but it's not a restriction that we actually enforce anywhere and technically did work previously.

It should indeed not be a problem. In the long-term however, I think we have to try to get the 32- (or even 64-)bit RUY version working (it already does, but simply not multithreaded), because it will be faster because of memory-alignment. I think we can do that in a separate PR, later.

One problem I encountered with 32-bit bitpacking in the reference kernel was being forced to use the kTFLiteInt32 type for bitpacked tensors, as TFL doesn't have an unsigned 32 bit integer type. Unfortunately, this conflicted with using std::uint8_t and std::uint32_t as destination scalars throughout my implementation to template the bitpacked versions of the kernels. TFLite didn't like me populating a kTFLiteInt32 tensor with std::uint32_t data, or extracting std::uint32_t data from the tensor. Rather than searching for a cumbersome workaround, I elected to change all std::uint8_t and std::uint32_t usage as destination scalars to the signed equivalents std::uint8 and std::uint32.
This means there is now a bit of a mis-match between TBitpacked and DstScalar, because in the former we still used the unsigned types (although I don't think there's a technical restriction given that the bitpacking functions use std::make_unsigned, and so would also work with the signed scalars).

How do you feel about switching to the signed bitpacking types everywhere?

I've ran into the same problems, its very annoying that you can't put uint32 data into a tflite tensor without reinterpret-casting it to int32.
Given that one could forget about std::make_unsigned (which can cause hard-to-debug problems), I vote for an entirely new datatype, now that we are getting int8 quantized types as well. The RFC is here: #325
This does not solve the issue of the tflite tensor type enum, but at least our code will be much clearer and readable.

It will probably be easiest to review the new changes commit-by-commit.

Thanks!

larq_compute_engine/tflite/tests/bconv2d_test.cc

larq_compute_engine/core/bgemm_kernels_ruy.h

larq_compute_engine/tflite/kernels/bconv2d.cc

Co-Authored-By: Adam Hillier <[email protected]>

larq_compute_engine/tflite/kernels/bconv2d.cc

larq_compute_engine/tflite/tests/bconv2d_test.cc

Co-Authored-By: Adam Hillier <[email protected]>

AdamHillier · 2020-04-07T10:38:57Z

GitHub won't let me approve my own PR, even if a substantial part of it was written by someone else, but thank you very much @Tombana for all of this 🎉

This 'test' is now split in two parts: 1) an actual test that takes an example Keras model and runs it through the 'generate_ikva_network' function. 2) a debug utility that can be ran as a stand-alone tool, now including a proper argument parser. And also: * Move Ikva end2end tests to its own target * Add new Ikva cocotb smoke test to CI

AdamHillier added the feature New feature or request label Mar 25, 2020

AdamHillier requested a review from a team March 25, 2020 12:29

Return after error.

4ba2a3b

AdamHillier commented Mar 25, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

larq_compute_engine/tflite/tests/BUILD Show resolved Hide resolved

arashb requested review from Tombana and removed request for a team March 25, 2020 12:37

AdamHillier added 3 commits March 25, 2020 14:57

Revert to uint64 in conv_params.

7ff1f0b

Merge remote-tracking branch 'origin/master' into read-write-bitpacke…

2db366f

…d-x86

Merge branch 'master' into read-write-bitpacked-x86

04ce741

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d_impl.h Outdated Show resolved Hide resolved

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d_impl.h Outdated Show resolved Hide resolved

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Outdated Show resolved Hide resolved

Tombana reviewed Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Outdated Show resolved Hide resolved

AdamHillier added 5 commits April 1, 2020 22:43

Add reading/writing bitpacked activation support to the reference ker…

a1a07f3

…nel.

Remove if constexpr, some review suggestions.

a8d1c1d

Rename T -> SrcScalar.

21aac16

Review suggestion.

51b14d1

Revert "Review suggestion."

d475b38

This reverts commit 51b14d1.

AdamHillier commented Apr 1, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Show resolved Hide resolved

larq_compute_engine/tflite/tests/bconv2d_test.cc Outdated Show resolved Hide resolved

Restore some removed static asserts.

0f93a10

AdamHillier force-pushed the read-write-bitpacked-x86 branch from 8525864 to 0f93a10 Compare April 1, 2020 23:55

AdamHillier commented Apr 2, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

Update larq_compute_engine/tflite/kernels/bconv2d.cc

539e5ac

Tombana mentioned this pull request Apr 3, 2020

RFC: Create a Packed datatype #325

Open

Tombana reviewed Apr 5, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Show resolved Hide resolved

Tombana added 5 commits April 6, 2020 11:16

Enable riptide feature on ARM64

af89dc8

Fix arm32 build

cb1574a

Rewrite unittests

e37c2da

Simplify im2col related if-block

9ccc367

Change RUY kernel to 32-bit bitpacked output

2f45415

Tombana previously approved these changes Apr 7, 2020

View reviewed changes

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/core/bgemm_kernels_ruy.h Outdated Show resolved Hide resolved

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/core/bgemm_kernels_ruy.h Outdated Show resolved Hide resolved

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

Apply suggestions from code review

872bd6b

Co-Authored-By: Adam Hillier <[email protected]>

Tombana dismissed their stale review via 872bd6b April 7, 2020 09:40

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/tflite/kernels/bconv2d.cc Outdated Show resolved Hide resolved

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Outdated Show resolved Hide resolved

AdamHillier commented Apr 7, 2020

View reviewed changes

larq_compute_engine/tflite/tests/bconv2d_test.cc Outdated Show resolved Hide resolved

Apply suggestions from code review

6b2aa8e

Co-Authored-By: Adam Hillier <[email protected]>

Tombana approved these changes Apr 7, 2020

View reviewed changes

lgeiger approved these changes Apr 7, 2020

View reviewed changes

Tombana merged commit 24df15a into master Apr 7, 2020

Tombana deleted the read-write-bitpacked-x86 branch April 7, 2020 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading and writing bitpacked activations in C++ kernels. #305

Support reading and writing bitpacked activations in C++ kernels. #305

AdamHillier commented Mar 25, 2020 •

edited

Loading

AdamHillier commented Apr 1, 2020 •

edited

Loading

Tombana commented Apr 3, 2020

AdamHillier commented Apr 7, 2020

Support reading and writing bitpacked activations in C++ kernels. #305

Support reading and writing bitpacked activations in C++ kernels. #305

Conversation

AdamHillier commented Mar 25, 2020 • edited Loading

What do these changes do?

How have these changes been tested?

AdamHillier commented Apr 1, 2020 • edited Loading

Tombana commented Apr 3, 2020

AdamHillier commented Apr 7, 2020

AdamHillier commented Mar 25, 2020 •

edited

Loading

AdamHillier commented Apr 1, 2020 •

edited

Loading