imlib/filter: Vectorize morph() kernel. #2415

kwagyeman · 2024-09-09T06:09:33Z

Depends on #2417.

Benchmark results here: https://docs.google.com/spreadsheets/d/1-FNVKCEr8-6UYs8MUm6wgsOt2c8ihJ2mg9QXKkG91os/edit?gid=452211341#gid=452211341

AE3 Performance with Helium is 4.2x faster than the RT1062.

Otherwise, note that this PR reduces the performance of the morph kernel by 50% for grayscale 3x3 kernels to be generic and vectorizable. The previous code provided the best possible speed for M4/M7 architectures but could not be vectorized and was only applicable for kernels of size 3x3. The new code offers vectorized processing for any kernel size.

Given the massive performance gain Helium has over the scalar code, this tradeoff makes sense.

Arguments mul/add were dropped as these are impossible to handle without complicating the default loop case. Additionally, they can easily overflow the 16-bit accumulators being used.

github-actions · 2024-09-09T06:14:00Z

Code Size Report:

Firmware	Text Diff	Data Diff	BSS Diff
ARDUINO_GIGA/firmware.elf	🔺0.02% (+280)	➖0.00% (+0)	➖0.00% (+0)
ARDUINO_NANO_33/firmware.elf	🔺0.00% (+8)	➖0.00% (+0)	➖0.00% (+0)
ARDUINO_NICLA_VISION/firmware.elf	🔺0.02% (+272)	➖0.00% (+0)	➖0.00% (+0)
ARDUINO_PORTENTA_H7/firmware.elf	🔺0.02% (+296)	➖0.00% (+0)	➖0.00% (+0)
OPENMV2/firmware.elf	🔺0.05% (+400)	➖0.00% (+0)	➖0.00% (+0)
OPENMV3/firmware.elf	🔺0.02% (+288)	➖0.00% (+0)	➖0.00% (+0)
OPENMV4/firmware.elf	🔺0.02% (+280)	➖0.00% (+0)	➖0.00% (+0)
OPENMV4P/firmware.elf	🔺0.02% (+280)	➖0.00% (+0)	➖0.00% (+0)
OPENMVPT/firmware.elf	🔺0.02% (+280)	➖0.00% (+0)	➖0.00% (+0)
OPENMV_RT1060/firmware.elf	🔺0.01% (+344)	➖0.00% (+0)	➖0.00% (+0)

kwagyeman force-pushed the kwabena/optimize_morph branch from 9d6c4ec to 3acaa57 Compare September 9, 2024 06:11

kwagyeman changed the title ~~Kwabena/optimize morph~~ imlib/filter: Vectorize morph() kernel. Sep 9, 2024

kwagyeman force-pushed the kwabena/optimize_morph branch 6 times, most recently from 35a520c to 3bc546d Compare September 14, 2024 00:03

kwagyeman added enhancement simd labels Sep 14, 2024

kwagyeman force-pushed the kwabena/optimize_morph branch 2 times, most recently from 44ccd95 to 56a348f Compare September 14, 2024 22:05

kwagyeman marked this pull request as ready for review September 14, 2024 22:08

kwagyeman requested a review from iabdalkader September 14, 2024 22:08

kwagyeman force-pushed the kwabena/optimize_morph branch 2 times, most recently from ae11f46 to 71f7a91 Compare September 15, 2024 01:23

kwagyeman added 2 commits September 16, 2024 22:29

imlib: Add macros for vectorizing filter kernels.

1a6ea45

imlib/filter: Vectorize morph() kernel.

1f5aaad

kwagyeman force-pushed the kwabena/optimize_morph branch from 71f7a91 to 1f5aaad Compare September 17, 2024 05:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imlib/filter: Vectorize morph() kernel. #2415

imlib/filter: Vectorize morph() kernel. #2415

kwagyeman commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 9, 2024 •

edited

Loading

imlib/filter: Vectorize morph() kernel. #2415

Are you sure you want to change the base?

imlib/filter: Vectorize morph() kernel. #2415

Conversation

kwagyeman commented Sep 9, 2024 • edited Loading

github-actions bot commented Sep 9, 2024 • edited Loading

kwagyeman commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 9, 2024 •

edited

Loading