Understanding the most recent heap implementation under Linux

Question

A few days ago, I was wondering how one could teach himself heap-based overflow exploitation.

So I searched through documentation, subsequently practicing what I read in order to have a better insight of how the heap works under Linux.

We are told that the malloc() / free() function works around Doug Lea's memory allocator but, in spite of the great explanation given by the link, I cannot figure things out as I debug my program.

Given this example:

#include <stdio.h> #include <stdlib.h> #include <string.h> int n = 5; int main(int argc, char** argv) { char* p; char* q; p = malloc(1024); q = malloc(1024); printf("real size = %d\n",*(((int*)p)-1) & 0xFFFFFFF8); if(argc >= 2) { strcpy(p, argv[1]); } free(q); printf("n = 0x%08X\n", n); free(p); return EXIT_SUCCESS; }

I would like to dump this structure in memory:

struct chunk { int prev_size; int size; struct chunk *fd; struct chunk *bk; };

Here is my workflow:

geo@lilith:~/c/vuln_malloc$ gcc -o vuln vuln.c -m32 -ggdb geo@lilith:~/c/vuln_malloc$ gdb ./vuln GNU gdb (GDB) 7.4.1-debian Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/geo/c/vuln_malloc/vuln...done. (gdb) b 21 Breakpoint 1 at 0x804850f: file vuln.c, line 21. (gdb) r `perl -e 'print "A" x 1024'` Starting program: /home/geo/c/vuln_malloc/vuln `perl -e 'print "A" x 1024'` real size = 1032 Breakpoint 1, main (argc=2, argv=0xffffd414) at vuln.c:21 21 free(q); (gdb) x/10x q-4 0x804a40c: 0x00000409 0x00000000 0x00000000 0x00000000 0x804a41c: 0x00000000 0x00000000 0x00000000 0x00000000 0x804a42c: 0x00000000 0x00000000 (gdb)

Here I can see the size-field's value, which is 0x409. I can easily guess that the real size of my chunk is 0x409 & 0xFFFFFF8 = 0x408 = 1032, as explained by the documentation (the three least significant actually define some flags). Then I run until the free() function is processed.

(gdb) b 22 Breakpoint 2 at 0x804851b: file vuln.c, line 22. (gdb) c Continuing. Breakpoint 2, main (argc=2, argv=0xffffd414) at vuln.c:22 22 printf("n = 0x%08X\n", n); (gdb) x/10x q-4 0x804a40c: 0x00020bf9 0x00000000 0x00000000 0x00000000 0x804a41c: 0x00000000 0x00000000 0x00000000 0x00000000 0x804a42c: 0x00000000 0x00000000

Firstly I don't understand the new value - 0x20bf9 - at all, secondly I don't understand why there isn't any relevant values as regard the fd and bk pointers either.

All of that stuff does not make much sense for me, that's why I was wondering wether you could give me some clues about all of this or not. Does the Doug Lea's implementation still exist in recent glibc versions, or...?

Related meta discussion: meta.reverseengineering.stackexchange.com/questions/268/… — asheeshr
– asheeshr, Commented Dec 5, 2013 at 14:55

perror · Accepted Answer · 2016-11-06 17:14:59Z

First of all, I have bad news for you ! Doug Lea's malloc is almost no more used in any C library implementation (even if understanding dlmalloc can help a lot to understand new ones).

The new implementation that is most widely used is ptmalloc2 and the best way to learn about it is... to read the code... So, if you are using a Debian(-like) distribution, just like me, you just need to get the source code of the libc like this:

$> apt-get source libc6

Note that the glibc is no more and has been subsumed by the eglibc project. The Debian distribution switched to eglibc some time ago (see here and there and even on the glibc source package page). So, the implementation of malloc has changed considerably (and some security safeties have been added since dlmalloc).

But, let see what does look like the malloc implementation:

$> cd eglibc-2.17/malloc/ $> less malloc.c ... /* This is a version (aka ptmalloc2) of malloc/free/realloc written by Doug Lea and adapted to multiple threads/arenas by Wolfram Gloger. There have been substantial changesmade after the integration into glibc in all parts of the code. Do not look for much commonality with the ptmalloc2 version. ...

As I said, the algorithm used here is ptmalloc2 (POSIX Threads Malloc), but with significant modifications. So, you'd better read the code to know better about it.

But, to sum up a bit, the heap memory is managed through memory chunks which are prefixed by meta-data gathering these information (I am just quoting comments that are in the malloc.c source file, refer to the whole file for more):

/* malloc_chunk details: (The following includes lightly edited explanations by Colin Plumb.) Chunks of memory are maintained using a `boundary tag' method as described in e.g., Knuth or Standish. (See the paper by Paul Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a survey of such techniques.) Sizes of free chunks are stored both in the front of each chunk and at the end. This makes consolidating fragmented chunks into bigger chunks very fast. The size fields also hold bits representing whether chunks are free or in use. An allocated chunk looks like this: chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of previous chunk, if allocated | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of chunk, in bytes |M|P| mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | User data starts here... . . . . (malloc_usable_size() bytes) . . | nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Where "chunk" is the front of the chunk for the purpose of most of the malloc code, but "mem" is the pointer that is returned to the user. "Nextchunk" is the beginning of the next contiguous chunk. Chunks always begin on even word boundries, so the mem portion (which is returned to the user) is also on an even word boundary, and thus at least double-word aligned. Free chunks are stored in circular doubly-linked lists, and look like this: chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of previous chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `head:' | Size of chunk, in bytes |P| mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Forward pointer to next chunk in list | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Back pointer to previous chunk in list | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Unused space (may be 0 bytes long) . . . . | nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `foot:' | Size of chunk, in bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The P (PREV_INUSE) bit, stored in the unused low-order bit of the chunk size (which is always a multiple of two words), is an in-use bit for the *previous* chunk. If that bit is *clear*, then the word before the current chunk size contains the previous chunk size, and can be used to find the front of the previous chunk. The very first chunk allocated always has this bit set, preventing access to non-existent (or non-owned) memory. If prev_inuse is set for any given chunk, then you CANNOT determine the size of the previous chunk, and might even get a memory addressing fault when trying to do so. Note that the `foot' of the current chunk is actually represented as the prev_size of the NEXT chunk. This makes it easier to deal with alignments etc but can be very confusing when trying to extend or adapt this code. The two exceptions to all this are 1. The special chunk `top' doesn't bother using the trailing size field since there is no next contiguous chunk that would have to index off it. After initialization, `top' is forced to always exist. If it would become less than MINSIZE bytes long, it is replenished. 2. Chunks allocated via mmap, which have the second-lowest-order bit M (IS_MMAPPED) set in their size fields. Because they are allocated one-by-one, each must contain its own trailing size field. */

You can also refer to 'Heap-Based Exploitation' which is a talk from Scott Hand about heap management and overflow exploitation.

Still, you have a lot of work to do to understand everything, I would have like to have more time to explain. But, I hope it helped you a bit to find ways to go deeper (downloading the source is really the key here).

web.archive.org/web/20131206064115/http://www.utdallas.edu/… — Bill B
– Bill B, Commented May 9, 2014 at 9:41
Update. As of Oct 2015 the eglibc project is dead. Most features were merged into glibc and Debian switched back to glibc — golem
– golem, Commented Oct 21, 2015 at 0:10

Stack Exchange Network

Understanding the most recent heap implementation under Linux

1 Answer 1

Hot Network Questions

Understanding the most recent heap implementation under Linux

1 Answer 1

Related

Hot Network Questions