This project provides a speech codec library for the RP2040 microcontroller chip using the open-source Codec2 algorithm. It enables storing long segments of speech in the limited memory of an embedded system or transferring digital voice across narrow band channels such as HAM radio.
The RP2040 microcontroller chip is a powerful and affordable option for embedded systems, based on the ARM Cortex-M0+ architecture. However, it has no floating-point instructions which makes the existing implementation run considerably slower than realtime (3.5x), despite it being well-written.
This port is trying to speed up the execution on limited platforms like this by implementing the math with fixed-point arithmetic. Having a super-cheap, readily available IC opens up various possibilities for shortwave HAM radio, voice-over-IP (VoIP) systems, and voice storage. Imagine a FreeDV implementation on a 4$ board the size of a postage stamp that could be retrofitted to any radio!
- Lightweight implementation of the Codec2 algorithm optimized for the RP2040 microcontroller chip
- Efficient speech
compressionand decompression suitable for embedded systems and narrow-band communication channels - Supports 1300bps mode, 24 hours of speech fits into ~13.4 Megabytes
- Uses DMA and PWM to generate audio directly from PICO - no I2S needed, just a simple RC filter
- Uses ~25% total CPU, plenty of room for other code
out.mp4
To use the speech codec library in your project, simply include the header file and link to the library file. The library interface is super simple and provides essential functions for encoding and decoding speech (well, just decoding for now). Here is the tl;dr version of how to use the library:
- Include codec2 header in your program and link against the codec2 library.
- Call codec2_init() at the start of your program once with no arguments
- Call codec2_decode(output, input) on a packet provided as input, get decoded raw signed audio back in output.
To quickly test it:
- Wire the audio like suggested on the diagram
- Download pico-hamradio.uf2 from the demos folder
- Power on the pico while holding the BOOTSEL button
- Pico will act like an USB drive
- Copy the file to it
It is a good idea to add a small resistor (22 or 47 Ω) to GP1 which is used as channel B PWM output. Permanently fixed at mid-point of the output voltage range, it uses inverted PWM polarity. These two hacks help reduce the noise level and produce a better (slightly less annoying) audio.
mkdir build
cd build
PICO_SDK_PATH=/your/sdk/path cmake ..
make
mkdir build
cd build
cmake ..
make
Currently, data is a simple byte array in the header file data.h, read in 7-byte chunks. This wastes 4 bits since packets are 52 bits long. The solution would be to combine two packets in 13 bytes (initial support is provided in the unpack function).
If you want to replace the provided audio with your own, it will need to be encoded with codec2.
Many Linux distros have Codec2 packaged, so you can easily install it (apt install codec2 on Debian-based distros) and then use c2enc to encode speech. Here is an example how you might approach encoding:
1. sox infile.wav -e signed -b 16 -c 1 - r 8000 raw_voice_data.raw
2. c2enc 1300 raw_voice_data.raw coded_data.bin
3. xxd -i coded_data.bin > /project/rootdir/header/data.h
This requires sox, codec2 and xxd to be installed.
Encoding time for 1 second of audio on 1 core running at 125 MHz clock
Floating point ▏ 3.542 ██████████████████████████████████████████████████████████
Fixed point ▏ 0.473 ████████
Initially, the build uses ~4% memory and 18% flash.
Memory region Used Size Region Size %age Used
FLASH: 380284 B 2 MB 18.13%
RAM: 11396 B 256 KB 4.35%
SCRATCH_X: 0 GB 4 KB 0.00%
SCRATCH_Y: 0 GB 4 KB 0.00%
If you move FFT LUTs to RAM (not declaring them constant should do it), you lose ~30% of RAM but gain ~9% speed and 12% flash.
Memory region Used Size Region Size %age Used
FLASH: 128684 B 2 MB 6.14%
RAM: 88676 B 256 KB 33.83%
SCRATCH_X: 0 GB 4 KB 0.00%
SCRATCH_Y: 0 GB 4 KB 0.00%
- The phase synth function was initially designed to run a loop L times in order to generate the necessary excitation samples. This required repeated use of trigonometric functions. However, replacing this approach with the Chebyshev method, which utilizes a recurrence relation turned out to be quite performant. This new approach involves only simple operations such as multiplication, shifting, and subtraction, which are not computationally expensive on the Cortex M0 processor.
- Magnitude of a complex number can be estimated with a largest error of 1.22% using the extended α max + β min algorithm for greater precision.
where α₀=1, β₀=5/32, α₁=27/32, β₁=71/128.
-
Instead of calculating the geometric mean using square roots for energy interpolation, we use a simple linear interpolation. Although this approach does not represent a mathematically accurate problem solution, it provides a fast and acceptable approximation.
-
Instead of calculating the phase angle using atan2, and then obtaining real and imaginary components in the frequency domain FFT by computing cos(phi) and sin(phi), we use an alternative method. Relying on the expressions
we estimate the magnitude using alpha-max, beta-min algorithm to avoid the computationally expensive denominator. This approach only requires a division, which the RP2040 hardware division coprocessor can handle without any issues. By using this method, we can avoid using square roots and limit the number of calls to trigonometric functions. In fact, we only need a single sin/cos for the initial recursion step and 10 calls to cos to convert line spectral frequencies to line spectral pairs. To implement trigonometry using only additions and shifts, we use the CORDIC algorithm.
Q: Where do I connect GND ?
A: You don't, it's just GP0 and GP1.
Q: This sounds terrible
A: Well, definitely not Hi-Fi ¯\_(ツ)_/¯ - it's compressed to the point where you only send a few lookup table indexes that describe a speech frame to rebuild that segment from scratch.
Q: With Cortex-M4 and Risc-V available, why bother?
A: It's very cheap, efficient, available to order (a big factor nowadays) and extremely popular. Also I wanted to learn DSP a bit. :-)
Q: Can this work on RPi?
A: Yes, I tried it on RPi 4 and it runs just fine. However, having a FPU and things like NEON SIMD instruction set makes floating point a better choice. If you're building for the RPi and expect performance, consider linking against the full CMSIS library which has different FFT implementation for CPUs with more advanced instruction sets.
Q: I'm getting linker error, there is not enough RAM
A: xxd generates the data array as unsigned char so it's not kept in flash. Add "const" in front of it or declare it like this:
#include <pico/platform.h>
const unsigned char __in_flash() coded_data [] = {
There are 🪲, dynamics are not quite right (frequency domain bin power calculation needs tweaking), postfilter is greatly simplified to avoid additional FFT. It would be convenient not having to perform that FFT but do the transform on the frequency domain directly or operate on the time domain. My limited n00b DSP skills are not there yet.
- Finish the encoder.
- Figure out how to further improve FFT by moving to assembly and using the RP2040 interpolator peripheral.
- Investigate the option of using lower precision FFT (q15).
- Division takes 9 cycless on the coprocessor, so interleaving the divisions with other operations and using async division routines might speed things up.
This library is free software. You can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation - either version 2.1 of the License, or (at your option) any later version. See the file LICENSE included with this distribution for more information.
Codec2 was skillfully architected and designed by mr. David Rowe.