0
$\begingroup$

In an audio (or likely any data transfer context) processing scenario, at times it is necessary for an internal process which takes a fixed amount of data N to be wrapped by an external process which takes a fixed but different amount of data M. There are no limitations on either, N can be equal to, greater than, or less than M, and there is no requirement for either to be multiples of anything, or powers of 2, or anything like that. The output of the internal process is captured by a ring buffer of (let's say) infinite length.

The requirement for contiguous data delivered by the wrapper process is that the amount of data available to it is always at least M. In order to ensure that once the data delivered by this process starts being delivered, it never breaks, a pause is enforced at the very beginning of its use, until there is enough data accumulated by the ring buffer to ensure that no break in contiguous data is possible thereafter. This pause is calculated as ceil(N/M) buffers, however, with some combinations of N and M it can be seen that the more optimal number of buffers would have been ceil(N/M) - 1, however I can't seem to find a mathematical rule for this, ideally one I can incorporate into the formula!

$\endgroup$
3
  • $\begingroup$ this is impossible to tell, because you don't tell us at all what your "wrapper" and your "inner" (they are just concatenated) do. $\endgroup$ Commented Nov 7 at 16:00
  • $\begingroup$ the job of the wrapper layer is simply to receive data of length M, queue it, call the inner process whenever it has enough data to do so, and queue the output from the inner process until it has enough to output (M). and let's say that, for example, the internal process is a 512 point FFT, whilst the wrapper layer could be buffering with any specified data length, thus the need for a general solution $\endgroup$ Commented Nov 7 at 16:31
  • $\begingroup$ but what does that actually mean? Since these lengths aren't multiples of each other, on the limit, you either always have a near-empty input of a near-full output queue, depending on whether your data source is slower or faster than the inner processing. You need to be actually very precise about what happens with overhang data, what the processing delays are, what, in actual integer numbers, the queue lengths are and what block sizes are, and how, if variable, block lengths relate to processing delay. $\endgroup$ Commented Nov 7 at 16:49

1 Answer 1

1
$\begingroup$

This will depend a bit on the details how you architect the system.

Let's call $M$ the frame size and $N$ the process size. We also assume that the system runs at a fixed frame rate, i.e. the frame function is called every M samples and the framing use the typically ping-pong buffer scheme. (you stream from/to ping using, say DMA, and you process from/to pong and then swap)

You will need a ring buffer both at the input and the output of the process so I think what you are asking is "what's the minimum latency" which is related to buffers.

In the simplest case we have $M=N$ so all processing happens in a single frame and for that the latency $L_{M=N} = 2M$ is simply twice the frame size. One frame to buffer up enough samples, and one more frame to do all the processing.

Things are also straight forward for $N<M$. In this case you will always be able to run the process at least once during a frame. The worst case will be if you have $N-1$ samples stuck in your input buffer which makes the total latency

$$L_{N<M} = 2M + N -GCD$$

Where GCD is the greatest common divisor. If $M$ and $N$ are mutually prime that becomes $L_{N<M} = 2M + N -1$ and if M is an integer multiple of M this becomes $L_{N<M} = 2M + N -N = 2M$, so no extra latency incurred.

$N>M$ is more tricky especially if N is substantially larger. In this case, chances are you don't have enough CPU cycles to run the process in a single frame and so you need to put the process in a separate thread, that runs at a lower priority then your frame thread. Total latency will depend on a lot of different factors and how the threading, task switching and data transfers are being handled.

If you have enough CPU horsepower to run the process over N samples in a single frame of size M, than the same equations as above apply. For example: for $N = 2M$ you would run the process every second frame and the latency would be $$L_{N=2M} = 2M + N - M = 3M$$

.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.