Lecture from: 07.10.2025 | Video: Videos ETHZ

The previous lecture demonstrated how to use the dynamic memory API (malloc, calloc, realloc, and free). This lecture opens the “black box” to examine how a memory allocator works, addressing the fundamental challenges and trade-offs in its design. This classic systems programming problem sits at the intersection of data structures, algorithms, and hardware reality.

The Role of a Dynamic Memory Allocator

A dynamic memory allocator serves as a software layer between the application and the operating system.

  • OS and Hardware Level: The kernel manages physical memory in large, fixed-size pages (typically 4 KB).
  • Application Level: Apps require memory for objects much smaller than a page.
  • Allocator’s Role: The allocator acts as a broker, subdividing large chunks from the OS into smaller blocks.

Recall: The malloc Package Contract

The API contract includes:

#include <stdlib.h>
 
void *malloc(size_t size);
void free(void *p);
void *realloc(void *p, size_t size);
  • malloc returns a pointer to a block of at least size bytes.
  • The returned pointer is aligned (address is a multiple of 8 or 16) for efficient CPU access.
  • free returns a block to the resource pool.
  • realloc resizes a previously allocated block.

Where Does Everything Go? A Memory Map Example

A concrete example illustrates the process address space.

char big_array[1<<24];   /* 16 MB, global initialized (implicitly to 0) */
char *p1, *p2, *p3, *p4;  /* Global pointers */
 
int main() {
    p1 = malloc(1 << 28); /* 256 MB */
    p2 = malloc(1 << 8);  /* 256 B  */
    // ...
}

  • big_array: Resides in the Data segment (.bss).
  • p1, p2, p3, p4: Pointer variables (8 bytes each) in the Data segment.
  • The malloc’d memory: Large blocks pointed to by p1 and p2 reside in the Heap, which grows upwards.
  • main(): Machine code resides in the Text segment.
  • $rsp (Stack Pointer): Points to the top of the Stack, which grows downwards.

Explicit vs. Implicit Allocators

Memory management strategies fall into two categories:

  1. Explicit Allocators (C, C++): The programmer is responsible for both allocation (malloc) and deallocation (free). This offers control but risks bugs.

  2. Implicit Allocators (Java, Python): The application allocates memory, but a Garbage Collector (GC) automatically reclaims unreachable memory.

This discussion focuses on building an explicit allocator.

The Core Problem: Managing the Heap

Writing malloc and free requires managing the heap as a large, contiguous block of memory to satisfy a sequence of requests.

Constraints on the Allocator

  • Cannot control requests: Must handle arbitrary sequences of malloc/free.
  • Must respond immediately: No reordering or buffering requests.
  • Must use the heap: Allocated blocks must come from existing free memory.
  • Must respect alignment: Addresses must be multiples of the alignment boundary.
  • Cannot move allocated blocks: Once pointers are returned, data cannot be moved (no compaction).

Performance Goals

Two conflicting goals exist:

  1. Maximize Throughput: Operations per second should be maximized (ideally ).
  2. Maximize Memory Utilization: Minimize waste, specifically fragmentation.

Challenge: Fragmentation

Fragmentation prevents satisfying requests despite having enough total free memory.

Internal Fragmentation

Occurs when an allocated block exceeds the payload size.

Causes:

  • Overhead: Metadata storage (e.g., block size).
  • Alignment: Padding to meet alignment requirements (e.g., 1 byte payload in a 16-byte block).
  • Policy Decisions: Minimum block sizes.

External Fragmentation

Occurs when sufficient total free memory exists, but no single contiguous block is large enough.

This depends on future request patterns and is the central challenge of allocator design.

Implementing a Simple Allocator: The Implicit Free List

The simplest allocator design solves two problems: knowing how much to free and tracking free blocks.

Knowing How Much to Free: The Header Field

The standard solution stores the block size in a header word immediately preceding the payload.

free(p0) reads the word at p0 - sizeof(word) to determine the block size.

Tracking Free Blocks: The Implicit List

An implicit list links all blocks (allocated and free) by size. Since the heap is contiguous, knowing a block’s size locates the next block.

The Header Trick: Size and Allocated Bit in One Word

Block sizes are multiples of 8 or 16, leaving the least significant bits unused (always 0). One bit can store the allocated/free flag.

Standard Optimization

  • ...000: Size is a multiple of 8.
  • Use the last bit: 1 = allocated, 0 = free.

  • True Size: header & ~0x7
  • Status: header & 0x1

Finding a Free Block: Placement Policies

malloc must traverse the list to find a suitable free block.

First-Fit Policy

Search from the beginning and choose the first sufficient block.

p = start_of_heap;
while (p < end_of_heap) {
    if ( (*p & 1) || ( (*p & -2) < requested_len) ) {
        p = p + (*p & -2); // Go to next block
    } else {
        break; // Found
    }
}
  • Pros: Simple.
  • Cons: Linear time scan. Creates small fragments at the start.

Other Policies:

  • Next-Fit: Starts search where the last one finished.
  • Best-Fit: Searches the entire list for the tightest fit. Slower but minimizes fragmentation.

Allocating in a Free Block: Splitting

If a free block is larger than requested, it is split. The beginning is allocated, and the remainder becomes a new free block.

Freeing a Block and the Need for Coalescing

Simply clearing the allocated flag creates “false fragmentation” where adjacent free blocks remain separate.

Coalescing merges newly freed blocks with adjacent free neighbors.

Bidirectional Coalescing with Boundary Tags

Coalescing with the next block is straightforward. Coalescing with the previous block is difficult in a one-way list.

The Solution: Boundary Tags (Footers)

Replicating the header as a boundary tag (footer) at the end of the block allows bidirectional traversal.

Checking the previous block involves reading the word before the current header (the previous footer).

This enables constant-time coalescing by checking both neighbors.

Implicit List Summary

  • Allocating: (linear search).
  • Freeing: (constant time with boundary tags).
  • Memory Overhead: Headers and footers cause internal fragmentation.

The linear allocation cost makes implicit lists impractical for high-performance allocators.

Explicit Free Lists

To optimize, an allocator can track only free blocks using an explicit free list (a linked list of free blocks).

The Clever Part: Where to Store the Pointers?

Pointers (next and prev) are stored in the payload area of free blocks, which is otherwise unused.

Overlaying Data

  • Allocated: Payload holds application data.
  • Free: Payload holds list pointers.

This imposes a minimum block size (Header + Next + Prev + Footer).

Logical vs. Physical Order

The logical list order need not match the physical memory order. Pointers can jump across allocated blocks.

Allocating from an Explicit Free List

  1. Search the free list.
  2. Splice the block out of the list.
  3. Split if necessary, adding the remainder back to the list.

Freeing with an Explicit Free List

Freeing involves inserting the block back into the list.

Insertion Policies:

  • LIFO (Last-In, First-Out): Insert at the root. but may fragment worse.
  • Address-Ordered: Maintain sorted order. search but lower fragmentation.

Freeing with LIFO: The Four Cases

Coalescing is managed alongside list insertion.

Case 1: No Coalescing Insert new block at root.

Case 2: Coalesce with Next Block Splice out next block, merge, and insert new larger block at root.

Case 3: Coalesce with Previous Block Symmetric to Case 2.

Case 4: Coalesce with Both Blocks Splice out both neighbors, merge all three, and insert at root.

Explicit List Summary

  • Allocating: (faster than ).
  • Freeing: (LIFO).
  • Overhead: Larger minimum block size.

Segregated Free Lists: The Best of Both Worlds

Segregated free lists improve performance by maintaining multiple free lists, each for a specific size class of blocks (e.g., 16, 32, 64-128 bytes).

Seglist Allocator Algorithm

  • Allocate: Determine size class, search suitable list. If empty, try larger classes. If all fail, request more heap memory.

  • Free: Coalesce, determine new size, insert into appropriate list.

  • Higher Throughput: Logarithmic time search for power-of-two classes.

  • Better Memory Utilization: Segregated lists approximate a best-fit search.

Practice: Allocator Mechanics

Designing an allocator requires careful bit manipulation and pointer arithmetic.

Exercise: Header Decoding

A block header in a 64-bit allocator has the value 0x0000000000000031. Assuming 8-byte alignment:

  1. What is the total block size?
    • Answer: 48 bytes. The header stores the size and the allocated bit. The size is 0x30 ( in decimal).
  2. Is the block allocated or free?
    • Answer: Allocated. The last bit is 1.

Exercise: Fragmentation Identification

Consider a heap with three free blocks of size 16, 32, and 16 bytes. The user calls malloc(48).

  1. Can the allocator satisfy this request if it cannot move blocks?
    • Answer: No. Even though total free space is 64 bytes, no single block is bytes. This is external fragmentation.

Continue here: 08 Garbage Collection and Software Engineering Principles