1

What is the standard for heap allocation systems in bare metal programs? If there is no standard, what is the most popular one? Is it a free list? If so, are there heuristics that use a hash table of pointers to skip to blocks of sizes near the requested block size?

Or is the standard heap allocator an RB tree of heap blocks?

I just don't understand how it would be a mere free list, because then best-fit searches would be O(n). An RB tree seems to be better; best-fit searches would be O(log2(n)).

1
  • 1
    There is a lot of prior art on allocators. There's no perfect approach because this depends on common usage patterns, what range of sizes you need, on your multithreading needs, whether ABA problem/fragmentation matters, and on your security goals. E.g. freelist allocators can turn use-after-free vulns into powerful exploits. Freelist doesn't mean "literally one linked list". The glibc/ptmalloc2/dlmalloc family uses multiple lists ("bins") in multiple caching levels that contain chunks of different size ranges. Worst case algorithmic complexity matters less than typical absolute speed. Commented May 30, 2024 at 20:09

3 Answers 3

5

I can't say what the currently normal or "standard" practice is, but allocators were a big part of my master's thesis. That was over 20 years ago, but I think most of the basic facts are still valid.

Algorithmic Big-O performance is not neccessarily the best criterium for the "best" allocator, considering things like:

  • The constant factor could be very significant when N is not very large (which it probably isn't for most "bare metal" programs).
  • a low memory overhead for the allocator's internal data structures is probably very desirable on bare metal systems (which are often memory constrained).
  • The real costs of an implementation may not be where you expect them to be; for example caches and branch prediction can become a huge factor on this low level and penalize complex data structures like hashtables and trees.
  • You seem to implicitly assume that an allocator must to a best-fit search. This is may not in fact be the best strategy even to avoid memory fragmentation, considering that memory which is allocated close in time is very often also freed close in time.
  • More generally: typical memory access patterns for different applications have a big impact on how allocators perform in the real word.

There is a research paper from 1998 by Mark S. Johnstone and Paul R. Wilson, The Memory Fragmentation Problem: Solved? which benchmarked a number of simple allocators against memory traces from some real-world programs. They found that simple allocators (like free lists) had good performance with low fragementation, much better than with randomly generated memory traces. Their explanation was that real-world programs show "phase behaviours" where often a lot of similar allocations happen together, and that even simple allocators can handle these patterns very well.

This implies that for any given program, a simple allocator will probably do well enough, and if it doesn't, you need to look at your program's allocation patterns and see what kind of allocator would be good for them.

1
  • Thank you for your response! Do think that free list heap allocators are more popular than tree heap allocators? Commented May 30, 2024 at 21:09
2

A typical backend for malloc() will be a Buddy or maybe Slab allocator.

There are several competing goals, and some memory is likely to be "wasted", in the sense that requests will be denied even though free fragments could collectively satisfy the request. As long as the wasted fraction remains "small", and {allocations, frees} happen "fast", we have a good allocator. Once having handed out an address to an application, usually we cannot move that memory prior to the app free()ing it. This is in distinction to a compacting GC.

There may be additional debug / verification flags to enable, which can help track down use-after-free, double-free, and similar valgrind type issues.

Once having allocated a chunk of memory, you might choose to have an Arena (or Bump) suballocator hand out small pieces of it. This works well for a Unit of Work, such as processing a web GET request. We might allocate for header, body part 1, part 2, send resulting HTML document to client, and then deallocate all of that all at once since we're finished with the page.

1

What is the standard for heap allocation systems in bare metal programs?

For quite a lot of embedded scenarios, the standard is "don't". That is, since you are running only one program with a well-defined set of use cases, and have very limited memory, and little ability to handle OOM, you pre-allocate all of it up front at design time. This is encouraged by MISRA, for example.

Beyond that, "use the malloc the platform library gives you" is probably the most common. Which will quite often be a freelist using first-fit.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.