Lecture from: 07.10.2025 | Video: Videos ETHZ
In the previous lecture, we learned how to use the dynamic memory API: malloc, calloc, realloc, and free. We treated these functions as a black box that gave us memory from the heap. Now, we’re going to open that black box. How does a memory allocator work? What are the fundamental challenges and trade-offs in its design? This is a classic systems programming problem that sits at the intersection of data structures, algorithms, and hardware reality.
The Role of a Dynamic Memory Allocator
A dynamic memory allocator is a layer of software that lives between your application and the operating system.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014103522.png)
- The OS and Hardware Level: The Virtual Memory (VM) hardware and the OS kernel work together to manage physical memory. They allocate memory to processes in large, fixed-size chunks called pages (typically 4 KB).
- The Application Level: Applications usually need memory for objects that are much smaller than a page (e.g., a few integers, a string, a struct). Requesting a full page from the OS for every small object would be incredibly wasteful.
- The Allocator’s Role: The dynamic memory allocator acts as a broker. It requests large chunks of memory (one or more pages) from the OS to manage the heap. It then subdivides this heap memory into smaller blocks of variable sizes to satisfy the application’s
mallocrequests. It manages the lifecycle of these blocks, keeping track of which are in use and which are free.
Recall: The malloc Package Contract
Let’s quickly review the API we are trying to implement.
#include <stdlib.h>
void *malloc(size_t size);
void free(void *p);
void *realloc(void *p, size_t size);The key promises (the “contract”) are:
mallocreturns a pointer to a block of at leastsizebytes.- The returned pointer is aligned. This means the address is a multiple of some number, typically 8 or 16. This is required for certain data types to be accessed efficiently by the CPU.
freetakes a pointer returned bymalloc/reallocand returns its block to the pool of available memory.reallocchanges the size of a previously allocated block.
Where Does Everything Go? A Memory Map Example
Let’s solidify our mental model of the process address space with a concrete example.
char big_array[1<<24]; /* 16 MB, global initialized (implicitly to 0) */
char *p1, *p2, *p3, *p4; /* Global pointers */
int main() {
p1 = malloc(1 << 28); /* 256 MB */
p2 = malloc(1 << 8); /* 256 B */
// ...
}/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014103707.png)
big_array: This is a large, global array. It resides in the Data segment (.bsssection, since it’s zero-initialized).p1,p2,p3,p4: These are global pointer variables. The variables themselves (8 bytes each on a 64-bit machine) live in the Data segment.- The
malloc’d memory: The huge 256 MB and 256 B blocks thatp1andp2will point to are allocated from the Heap. The heap grows upwards to accommodate these requests. main(): The machine code for themainfunction lives in the Text segment.$rsp(Stack Pointer): This CPU register points to the top of the Stack, which grows downwards from high memory.
Explicit vs. Implicit Allocators
Memory management strategies fall into two broad categories:
-
Explicit Allocators (e.g., C, C++): The application programmer is responsible for both allocating (
malloc,new) and explicitly freeing (free,delete) memory. This offers maximum control but is a common source of bugs./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014103821.png)
-
Implicit Allocators (e.g., Java, Python, Lisp): The application only allocates memory (
new). It never explicitly frees it. A background process, the Garbage Collector (GC), automatically detects when memory is no longer reachable by the program and reclaims it./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014103842.png)
For the first part of this lecture, we will focus on building an explicit allocator.
The Core Problem: Managing the Heap
Our goal is to write malloc and free. To do this, we must manage the heap, which is just a large, contiguous block of memory. We will be given a sequence of malloc and free requests and must satisfy them under a set of constraints.
Constraints on the Allocator
- Cannot control requests: Must handle any sequence of
mallocandfreecalls. - Must respond immediately: Cannot reorder or buffer requests for better efficiency.
- Must use the heap: Can only place allocated blocks in memory marked as free.
- Must respect alignment: Blocks must be allocated at addresses that are multiples of an alignment boundary (e.g., 16 bytes on x86-64).
- Cannot move allocated blocks: Once
mallocreturns a pointer, the data at that address cannot be moved by the allocator (this would invalidate the pointer). This rules out compaction.
Performance Goals
We want to build an allocator that is both fast and efficient. These two goals are often in conflict.
- Maximize Throughput: The number of
mallocandfreeoperations completed per second. Ideally, we want these to be very fast, perhaps constant time (). - Maximize Memory Utilization: We want to use the memory we get from the OS as efficiently as possible, minimizing waste.
The primary enemy of good memory utilization is fragmentation.
Challenge: Fragmentation
Fragmentation is wasted space in the heap that prevents us from satisfying allocation requests. There are two types:
Internal Fragmentation
This occurs when an allocated block is larger than the requested payload.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014104000.png)
Causes:
- Overhead: Storing metadata (like the block’s size) within the block itself.
- Alignment: To satisfy alignment requirements, a block may need to be larger than the payload. If a user requests 1 byte, we might have to give them a 16-byte block. The extra 15 bytes are internal fragmentation.
- Policy Decisions: An allocator might decide to give a small request a larger block to avoid creating tiny, unusable free fragments.
External Fragmentation
This occurs when there is enough total free memory in the heap to satisfy a request, but no single contiguous block is large enough.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014104015.png)
This is the harder problem. It depends on the unpredictable future pattern of malloc and free requests. Minimizing external fragmentation is the central challenge of allocator design.
Implementing a Simple Allocator: The Implicit Free List
Let’s start with the simplest possible design for an allocator. We need to solve two fundamental problems:
- How does
free(p)know how much memory to deallocate, given only a pointer? - How do we keep track of all the free blocks so
malloccan find one?
Knowing How Much to Free: The Header Field
The standard solution is to store the block’s size in a header word located in memory immediately before the payload.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014104150.png)
When the application calls free(p0), the allocator can simply look at the word at address p0 - sizeof(word) to retrieve the size of the block it needs to free. This header is a form of internal fragmentation.
Tracking Free Blocks: The Implicit List
The simplest way to track all blocks in the heap is to link them together implicitly. Since the heap is one contiguous region of memory, if we know where one block starts and how big it is, we know where the next one starts.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105229.png)
The entire heap becomes a sequence of blocks, each with a header telling us its size and whether it’s allocated or free.
The Header Trick: Size and Allocated Bit in One Word
We need to store two pieces of information in the header: the block’s size and an allocated/free flag. We could use two words, but that’s wasteful.
A Standard Trick for Efficient Headers
Because of alignment requirements, all block sizes (and addresses) will be multiples of 8 or 16. This means the 3 least significant bits of the size will always be
0. We can “steal” one of these unused bits to store our allocated/free flag.
...000: A size that’s a multiple of 8.- We use the last bit:
1= allocated,0= free.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105303.png)
To work with this combined header:
- To get the true size, we must mask out the flag bit:
size = header & ~0x7(for 8-byte alignment). - To check if a block is allocated, we check the flag bit:
is_allocated = header & 0x1.
Finding a Free Block: Placement Policies
Now, when malloc is called, it must search for a free block that is large enough. With an implicit list, it must traverse the list from the beginning.
First-Fit Policy
Search the list from the beginning and choose the first free block that is large enough.
// Pseudo-code for first-fit search
p = start_of_heap;
while (p < end_of_heap) {
// Is block at p allocated OR too small?
if ( (*p & 1) || ( (*p & -2) < requested_len) ) {
// Yes, so go to the next block
p = p + (*p & -2); // Add true size to current pointer
} else {
// Found a suitable free block!
break;
}
}- Pros: Simple to implement.
- Cons: Can be slow (linear time in the number of all blocks). Tends to leave small “splinter” fragments at the beginning of the heap.
Other Policies:
- Next-Fit: Like first-fit, but starts the search from where the last search finished. Often faster, but can lead to worse fragmentation.
- Best-Fit: Searches the entire list to find the free block that fits with the least amount of leftover space. Minimizes fragmentation but is much slower.
Allocating in a Free Block: Splitting
If we find a free block that is larger than the requested size, we should split it. We use the beginning of the block for the new allocation and turn the remainder into a new, smaller free block.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105642.png)
Freeing a Block and the Need for Coalescing
The simplest way to implement free(p) would be to just find the block’s header and clear the allocated flag.
void free_block(ptr p) {
*p = *p & -2; // Clear the allocated bit
}The Problem: This creates “false fragmentation.” We might end up with two free blocks right next to each other that the allocator still treats as separate.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105801.png)
To solve this, we must coalesce: whenever a block is freed, we must check its neighbors. If a neighbor is also free, we merge them into a single, larger free block.
Bidirectional Coalescing with Boundary Tags
- Coalescing with the next block is easy: We can use the current block’s size to find the next block’s header and check its allocated flag.
- Coalescing with the previous block is hard: Our implicit list only lets us move forward.
The Solution: Boundary Tags (Footers)
We replicate the header at the end of each block. This footer is called a boundary tag.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105854.png)
Now, to check the previous block:
- Look at the word just before the current block’s header. This is the previous block’s footer.
- Read the size from the footer.
- Use the size to jump backward to the previous block’s header and check if it’s free.
With headers and footers, we can perform constant-time coalescing. When freeing a block, we can immediately check both its neighbors and perform one of four possible merges, depending on their status.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014105926.png)
Implicit List Summary
- Implementation: Very simple. The entire heap is treated as a contiguous array of blocks.
- Allocate Cost: Linear time in the worst case, , where N is the total number of blocks (both allocated and free). This is because
mallocmay have to scan the entire heap to find a suitable free block. This is the primary drawback. - Free Cost: Constant time, , with boundary tags.
freeis very fast because it only needs to check its immediate neighbors to perform coalescing. - Memory Usage: Will depend on the placement policy (first-fit, next-fit, etc.). Internal fragmentation is present due to headers and footers.
While the concepts of splitting and coalescing are fundamental, the linear-time allocation cost makes the implicit list impractical for general-purpose malloc implementations. We need a faster way to find free blocks.
Explicit Free Lists
The bottleneck of the implicit list is that the search for a free block must step over all the allocated blocks. The obvious optimization is: what if we only kept track of the free blocks?
This is the core idea behind an explicit free list. We create a data structure, a linked list, whose nodes are the free blocks themselves.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014111049.png)
The Clever Part: Where to Store the Pointers?
To create a linked list, each free block needs to store next and prev pointers. This seems to require more overhead (more internal fragmentation). However, we can use a clever trick:
Using the Payload for Pointers
When a block is free, its payload area is unused by the application. We can repurpose this “garbage” space to store the
nextandprevpointers of our free list. When the block is allocated, this area is reclaimed for the application’s data.
This means the format of a block depends on its status:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014111234.png)
- Allocated Block: The application uses the payload.
- Free Block: The allocator uses the payload to store list pointers.
This requires a minimum block size. A block must be large enough to hold the header, footer, and both a next and prev pointer. On a 64-bit system, this is typically 8 (header) + 8 (next) + 8 (prev) + 8 (footer) = 32 bytes. This is why 16-byte alignment is common.
Logical vs. Physical Order
With an explicit list, the order of blocks in the list does not have to match their physical order in memory. The next pointer of a free block can point to any other free block, anywhere in the heap.
- Logically, the free list is a clean, ordered sequence.
- Physically, the pointers can jump all over the heap, skipping over allocated blocks.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014111751.png)
Allocating from an Explicit Free List
The process is now:
- Search the free list (not the whole heap) for a suitable block.
- Once found, “splice” it out of the linked list by updating the
nextandprevpointers of its neighbors. - If the found block is larger than needed, split it. Mark the beginning part as allocated for the user.
- Add the remaining part back into the free list as a new, smaller free block.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112302.png)
Freeing with an Explicit Free List
Freeing a block now means adding it back into the free list. This introduces a new design choice: the insertion policy.
-
LIFO (Last-In, First-Out) Policy: Insert the newly freed block at the beginning (the root) of the free list.
- Pro: Simple and very fast, a constant-time operation.
- Con: Studies suggest it can lead to worse fragmentation than other policies.
-
Address-Ordered Policy: Traverse the list to insert the freed block so that the list remains sorted by memory address.
- Pro: Studies suggest this leads to lower fragmentation.
- Con: Insertion requires a search, making
freean operation, where is the number of free blocks.
Let’s visualize the LIFO free operation, which also involves coalescing. There are four cases, identical to the implicit list, but now we must also manage the list pointers.
Freeing with LIFO: The Four Cases
Case 1: No Coalescing (Neighbors are allocated) The block being freed is surrounded by allocated blocks.
- Before: The free list exists, and the block to be freed is not in it.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112325.png)
- Action: The newly freed block is simply inserted at the root of the list. Its
nextpointer is set to the old root, and the root is updated to point to the new block./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112335.png)
Case 2: Coalesce with Next Block The block to the right is free.
- Before: The block to be freed is
n. The next physical block,m2, is already in the free list./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112404.png)
- Action:
- Splice the successor block (
m2) out of the free list. - Coalesce
nandm2into a new, larger free block of sizen+m2. - Insert this new
n+m2block at the root of the list (LIFO policy)./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112423.png)
- Splice the successor block (
Case 3: Coalesce with Previous Block
The block to the left is free. This is symmetric to Case 2.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112740.png)
Case 4: Coalesce with Both Blocks
Both the previous (m1) and next (m2) blocks are free.
- Before: Both neighbors are in the free list.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112752.png)
- Action:
- Splice both the predecessor (
m1) and successor (m2) out of the free list. - Coalesce all three blocks into one large block of size
m1+n+m2. - Insert this new, very large block at the root of the list.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014112800.png)
- Splice both the predecessor (
Explicit List Summary
- Comparison to Implicit List:
- Allocate Cost: instead of . This is much faster when the heap is mostly full.
- Free Cost: Slightly more complex due to splicing list pointers, but still constant time () for LIFO policy.
- Memory Overhead: Requires extra space for
next/prevpointers within free blocks, which can increase internal fragmentation by enforcing a larger minimum block size.
Segregated Free Lists: The Best of Both Worlds
We can improve performance even further. Instead of one giant free list, what if we had several? A segregated free list allocator maintains an array of free lists, where each list is responsible for a different size class of blocks.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251014113021.png)
Common Strategy:
- For small sizes, have a separate list for each size (e.g., a list for 16-byte blocks, a list for 32-byte blocks, etc.).
- For larger sizes, group them into power-of-two ranges (e.g., a list for blocks 65-128 bytes, one for 129-256 bytes, etc.).
Seglist Allocator Algorithm
-
To Allocate a block of size
n:- Determine the appropriate size class for
n. - Search the corresponding free list for a block that fits.
- If a block is found, (optionally) split it, and return the allocated portion. Place the remainder in the appropriate (smaller) free list.
- If no block is found in that list, try the next larger size class list. Repeat.
- If no block is found in any list, request a new chunk of memory from the OS (using
sbrk()), create a new large block, and place it in the largest size class list. Then, repeat the allocation process.
- Determine the appropriate size class for
-
To Free a block:
- Coalesce the block with its neighbors if possible.
- Determine the size class of the resulting free block and place it in the corresponding free list.
Advantages of Segregated Lists
- Higher Throughput:
mallocsearches are much faster. Instead of searching a long list, you go directly to a list of blocks that are likely to be the right size. For power-of-two size classes, this can approach logarithmic time. - Better Memory Utilization: A first-fit search on a segregated list naturally approximates a best-fit search on the entire heap. By going to the list for size
n, you are inherently picking from a pool of blocks that are a “good fit” for your request, which tends to reduce fragmentation.
Continue here: 08 Implicit Memory Management and Garbage Collection, Debugging, Testing, Design