Lecture from: 01.10.2025 | Video: Videos ETHZ

The previous lecture introduced the fundamental concepts of memory addresses and pointers. This lecture solidifies that understanding by dissecting a classic, dense C idiom, exploring the art of deciphering complex pointer declarations, and transitioning from stack-based memory to the flexible world of the heap and dynamic memory allocation.

A Masterclass in C Idiom: A ‘Simple’ strcpy()

The following analysis focuses on a famously compact implementation of the strcpy function. This code serves as a crucible of C’s pointer syntax, operator precedence, and expression evaluation.

char *strcpy(char *dest, char *src) {
    char *r = dest;
    while(*dest++ = *src++);
    return r;
}

The while loop, lacking a body and using an assignment as a condition, requires breaking down based on operator precedence.

Operator Precedence

In C, postfix increment/decrement operators (++, --) and function calls () have higher precedence than the dereference operator *. The assignment operator = has one of the lowest precedences.

Therefore, *dest++ is parsed as *(dest++).

Tracing the expression *dest++ = *src++ step by step reveals the logic:

  1. *src++ (The Right-Hand Side):

    • Due to higher precedence, the expression is *(src++).
    • src++ is a post-increment. The current value of src is used, and the increment is queued to occur after the expression is evaluated.
    • The C runtime retrieves the current address in src.
    • The * dereferences this original address, fetching the character (e.g., ‘H’).
    • The side effect (incrementing src) is pending.
  2. *dest++ (The Left-Hand Side):

    • The logic mirrors the right-hand side: *(dest++).
    • The runtime retrieves the current address in dest.
    • This original address is the target for assignment.
    • The side effect (incrementing dest) is pending.
  3. = (The Assignment):

    • The fetched character (‘H’) is assigned to the memory location pointed to by the original destination address.
  4. while(...) (The Loop Condition):

    • An assignment in C is an expression. Its value is the assigned value.
    • Here, the value is ‘H’ (72), which is non-zero (true), so the loop continues.
  5. The Semicolon and Side Effects:

    • The semicolon ; marks a sequence point where queued side effects execute.
    • src increments to the next character.
    • dest increments to the next byte.

Termination

The process continues until src points to the null terminator (\0).

  • *src++ fetches \0.
  • \0 (value 0) is assigned to *dest++, terminating the destination string.
  • The assignment expression evaluates to 0.
  • The while condition sees 0 (false) and terminates.

This demonstrates C’s density and idiomatic nature, though modern practices often prefer explicit loops for clarity.

Deciphering Complex C Pointer Declarations

C’s syntax for complex types can be daunting (e.g., int (*(*x[3])())[5]). To decipher these, use the “Right-Left Rule”:

  1. Start at the variable name.
  2. Read to the right until hitting a closing parenthesis ) or the end of the line.
  3. Read to the left.
  4. When hitting an opening parenthesis (, jump out and repeat.

Applying this to examples:

  • int *p;

    1. Start at p: p is…
    2. Read left: …a pointer to…
    3. Read left: …an int.
    • Result: p is a pointer to an int.
  • int *p[13];

    1. Start at p: p is…
    2. Read right: …an array of 13…
    3. Read left: …pointers to…
    4. Read left: …int.
    • Result: p is an array of 13 pointers to int.
  • int (*p)[13];

    1. Start at p: p is…
    2. Parentheses force reading left: …a pointer to…
    3. Jump out, read right: …an array of 13…
    4. Read left: …ints.
    • Result: p is a pointer to an array of 13 ints.
  • int (*f)();

    1. Start at f: f is…
    2. Read left: …a pointer to…
    3. Jump out, read right: …a function taking unspecified arguments and returning…
    4. Read left: …an int.
    • Result: f is a pointer to a function returning an int.

Function Pointers

A function has an address in the read-only code segment, just as a variable has an address. A function pointer stores this address. This enables callbacks and dynamic behavior tables.

Syntax:

// func is a pointer to a function that takes an (int *, char)
// and returns an int.
int (*func)(int *, char);

The parentheses around *func are crucial to distinguish it from a function returning a pointer.

A Real-World Example: The Linux Kernel VFS

The Linux kernel’s Virtual File System (VFS) utilizes function pointers to provide a uniform interface for various filesystems. It defines a struct of function pointers, known as a vtable:

struct file_operations {
    ssize_t (*read) (struct file *, char *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
    int (*open) (struct inode *, struct file *);
    // ... and many more operations
};

Calling read() on a file descriptor triggers the kernel to look up the file_operations struct and invoke the specific read function pointer, providing polymorphism.

Dynamic Memory Allocation

Two allocation methods have been discussed so far:

  1. Static Allocation: Global variables, allocated at load time, persisting for the program’s life.
  2. Automatic Allocation: Local variables, allocated on the stack, deallocated on return.

Often, memory needs to persist across calls, be large, or have a size unknown at compile time. Dynamic memory allocation addresses this by requesting memory from the heap (or “free store”).

Manual vs. Automatic Memory Management

Unlike languages with garbage collectors (Java, Python), C requires manual memory management. The programmer must explicitly request and release memory, offering control but requiring care.

The C Memory API

The standard C library (<stdlib.h>) provides the API for dynamic memory.

malloc()

malloc (memory allocate) requests memory.

// declared in stdlib.h
void *malloc(size_t sz);
  • Argument: Number of bytes.
  • Returns: void * to the first byte, or NULL on failure.
  • Note: Allocated memory is uninitialized (contains garbage).

Canonical Usage

// Allocate space for an array of 10 longs
long *arr = (long *)malloc(10 * sizeof(long));
 
// ALWAYS check for failure!
if (arr == NULL) {
    // Handle error: print message, exit, etc.
    return ERRCODE;
}
 
// Now you can use the memory
arr[0] = 5L;

Key patterns include using sizeof() for portability and casting the returned void *.

calloc()

calloc (contiguous allocate) differs from malloc:

// declared in stdlib.h
void *calloc(size_t nm, size_t sz);
  • Arguments: Number of elements (nm) and size of each (sz). Allocates nm * sz bytes.
  • Zeroes memory: The block is guaranteed to differ from malloc by being zero-filled.

free(): Releasing Memory

Used memory must be released to the system via free().

// declared in stdlib.h
void free(void *ptr);
  • Argument: pointer returned by malloc, calloc, or realloc.
  • Constraint: The exact pointer must be passed.
  • After freeing, the pointer becomes dangling. Accessing it is undefined behavior.

Good Practice After free()

Set the pointer to NULL immediately after freeing to prevent accidental use.

free(arr);
arr = NULL;

realloc(): Resizing a Block

realloc changes the size of an allocated block.

// declared in stdlib.h
void *realloc(void *ptr, size_t size);
  • Arguments: Original pointer and new size.
  • Behavior: May extend/shrink in place or allocate a new block, copy contents, and free the old block.
  • Usage: Always use the new pointer returned.

Canonical Usage

long *new_arr = (long *)realloc(arr, 20 * sizeof(long));
if (new_arr == NULL) {
    // Handle realloc failure. Note that the original 'arr' is still valid!
    return ERRCODE;
}
arr = new_arr; // Update the pointer to the new location.

A Complete Example: A Dynamic Array

This program reads an unknown number of integers, resizing the array dynamically.

Logic Breakdown:

  1. Initialization: Allocate initial array with calloc.
  2. Loop: Read numbers.
  3. Check Capacity: Check if full.
  4. Grow: Double size (sz *= 2) and realloc if full.
  5. Store: Add new number.
  6. Cleanup: free the final block.

Perils of the Heap: Memory Corruption and Leaks

Manual management introduces risks.

Memory Corruption

Writing to invalid memory causes corruption.

Common bugs:

  • Buffer Overflow: Writing past the end.
  • Pointer Arithmetic Errors: Calculating invalid addresses.
  • Freeing Invalid Pointers: Freeing stack addresses or non-malloc pointers.
  • Double Free: Freeing the same pointer twice.
  • Use-After-Free: Accessing freed memory.

Memory Leaks

A memory leak occurs when allocated memory is never freed. The program loses the pointer, making deallocation impossible.

Leaks accumulate, potentially crashing long-running programs or the system.

User-Defined Types: struct, union, and typedef

C supports complex custom types.

struct: Composing Data

A struct groups variables under a name.

// Defines a new type called "struct Point"
struct Point {
    int x;
    int y;
};
 
// Declares and initializes a variable of this type on the stack
struct Point p1 = {10, 20};

Accessing Members:

  • Dot operator (.) for direct access: p1.x = 15;
  • Arrow operator (->) for pointer access: p_ptr->y = 30;

Arrow Operator ->

p_ptr->y is syntactic sugar for (*p_ptr).y.

Structs and Functions: Structs are passed by value (copied). Passing pointers (by reference) is more efficient for large structs.

union: Sharing Memory

A union shares one memory location among members. Size equals the largest member.

union u {
    int   ival;
    float fval;
    char* sval;
};

The programmer tracks the currently stored type. Reading fval after writing ival yields garbage.

typedef: Creating Type Aliases

typedef creates aliases for types, improving readability.

// Now, uint32_t can be used as an alias for "unsigned int"
typedef unsigned int uint32_t;
 
// This is much cleaner than writing "struct Point" everywhere
typedef struct Point Point;
Point p1;

Taming a Monster Declaration with typedef

To declare x as int (*(*x[3])())[5] (array of 3 pointers to functions returning pointer to array of 5 ints):

  1. Innermost: typedef int fiveints[5];
  2. Next: typedef fiveints* p5i;
  3. Next: typedef p5i (*f_of_p5is)();
  4. Final: f_of_p5is x[3];

This step-by-step process, giving meaningful names to intermediate types, transforms an unreadable declaration into a series of simple, understandable steps.

C Namespaces

C uses separate namespaces to avoid conflicts.

Namespaces exist for:

  1. Labels
  2. Tags (struct, union, enum)
  3. Member names (per struct/union)
  4. Ordinary identifiers (variables, functions, typedefs)

This allows struct id { int id; }; without conflict, though it is poor style.

Practice: Pointers and the Heap

Mastering the heap is about managing lifecycles and understanding memory layout.

Exercise: Pointer Declaration

Decipher the following declaration: char *(*f[5])(int);

  • Answer: f is an array of 5 pointers to functions that take an int and return a char * (pointer to char).

Exercise: The malloc Trap

What is wrong with this code?

int *p = malloc(sizeof(int));
*p = 10;
p = malloc(sizeof(int));
free(p);
  • Answer: Memory Leak. The first block allocated for p is overwritten by the second malloc call before it is freed. The address of the first block is lost forever.

Exercise: Struct Sizing

How many bytes would struct { char c; int i; } likely take on a 64-bit system with 4-byte padding?

  • Answer: 8 bytes. char (1) + padding (3) + int (4). (Note: This depends on alignment rules, but usually members are aligned to their size).

Continue here: 07 Dynamic Memory Allocation - Implementation and Strategies