You are running into multiple issues here.
- UART bandwidth is too low
Your audio stream is:
16 kHz * 16 bit = 256 kbit/s raw data
But you are not sending raw samples. You are sending ASCII numbers like
-12345\n
That is typically ~6–7 bytes per sample. UART also sends start/stop bits, so the real bandwidth becomes roughly
16000 samples/s * 7 bytes * 10 bits ≈ 1.12 Mbit/s
If your UART runs at e.g. 115200 baud , that's an order of magnitude too slow. So samples must be dropped somewhere.
Thus, use a high baud rate. On STM32 something like 921600 baud is usually fine and commonly supported by USB-UART adapters.
- ASCII encoding is extremely inefficient
This pipeline is very expensive.
Python
int → string → UART TX
STM32
UART RX → atoi() → average filter → sprintf() → UART TX
Python
UART RX → string → int
Both atoi() and especially sprintf() are slow and unnecessary here. Avoid any printf-style formatting inside a sample processing loop.
Instead send samples as binary int16_t directly over the wire.
Python:
ser.write(np.int16(s).tobytes())
STM32:
int16_t sample; HAL_UART_Receive(&huart2, (uint8_t*)&sample, sizeof(sample), HAL_MAX_DELAY);
Return the result the same way:
HAL_UART_Transmit(&huart2, (uint8_t*)&result, sizeof(result), HAL_MAX_DELAY);
That reduces the required bandwidth to roughly
16000 * 2 bytes * 10 bits ≈ 320 kbit/s
which is easily doable at 921600 baud.
- HAL_UART_Receive() in polling mode is not great for streaming
You are using blocking polling:
HAL_UART_Receive(..., HAL_MAX_DELAY)
For higher throughput you should use at least
HAL_UART_Receive_IT()
or ideally
HAL_UART_Receive_DMA()
The polling UART access mode will cause it so that if the internal UART buffer (probably like 1 character, or 16 character FIFO at most) is full, and the CPU is not actively in the HAL_UART_Receive() call, the data is lost. That's a UART "overrun" -- a character was received but couldn't be stored because the buffer was not empty.
UART DMA is commonly used for streaming data like audio. Using either the interrupt (IT) or direct memory access (DMA) modes to receive UART data greatly reduces the CPU load, as either the CPU is interrupted only for a very short time when a new character appears to store it in a buffer (interrupt), or is even not doing the store at all because the DMA did it for it.
Note that to setup the UART in interrupt or DMA mode, you should go back to the STM32CubeMX configurator. It will require a different initialization and the addition of interrupt handlers for the UART peripheral to properly handle it.