In the previous lecture, we introduced the fundamental concepts of memory addresses and pointers. Today, we will solidify that understanding by dissecting a classic, dense C idiom, explore the art of deciphering complex pointer declarations, and then make the crucial leap from stack-based memory to the flexible world of the heap and dynamic memory allocation.
A Masterclass in C Idiom: A ‘Simple’ strcpy()
Let’s begin by analyzing a famously compact implementation of the strcpy function. This tiny piece of code is a crucible of C’s pointer syntax, operator precedence, and expression evaluation.
At first glance, the while loop is baffling. It has no body, and its condition is an assignment! To understand it, we must break it down piece by piece, paying close attention to operator precedence.
Operator Precedence
In C, the postfix increment/decrement operators (++, --) and function calls () have a higher precedence than the dereference operator *. The assignment operator = has one of the lowest precedences.
Therefore, *dest++ is parsed as *(dest++).
Let’s trace the expression *dest++ = *src++ step by step:
*src++ (The Right-Hand Side):
Because ++ has higher precedence, the expression is *(src++).
The src++ part is a post-increment. This means “use the current value of src, and after the entire expression is evaluated, increment src.”
So, the C runtime first gets the current address stored in src.
The * then dereferences this original address, fetching the character at that location. Let’s say it’s the character ‘H’.
The side effect, incrementing src to point to the next character, is queued up.
*dest++ (The Left-Hand Side):
The exact same logic applies. The expression is *(dest++).
The runtime gets the current address stored in dest.
This original address is marked as the target for the assignment.
The side effect, incrementing dest to point to the next memory location, is also queued up.
= (The Assignment):
The character fetched from the source (‘H’) is assigned to the memory location pointed to by the original destination address.
while(...) (The Loop Condition):
In C, an assignment is an expression, not just a statement. The value of an assignment expression is the value that was assigned.
In our example, the value of the expression is the character ‘H’.
The while loop evaluates this value as its condition. Any non-zero value is considered true. The ASCII value of ‘H’ is 72, which is non-zero, so the loop continues.
The Semicolon and Side Effects:
The semicolon ; marks the end of the statement. At this “sequence point,” all the queued-up side effects are executed.
src is incremented to point to the next character (‘e’).
dest is incremented to point to the next available byte.
Termination
This copy-and-increment process continues character by character. How does it stop?
Eventually, src will point to the null terminator (\0) at the end of the source string.
The loop executes one last time:
*src++ fetches the \0 character.
This \0 (which has a numeric value of 0) is assigned to *dest++. The destination string is now correctly null-terminated.
The value of the assignment expression is 0.
The while condition evaluates 0 as false, and the loop terminates.
This is C at its most dense and idiomatic. It’s clever, but for modern code, a more explicit for loop is often preferred for clarity.
Deciphering Complex C Pointer Declarations
C’s syntax for declaring complex types can be intimidating. How do you read something like int (*(*x[3])())[5]? The key is a systematic approach called the “Right-Left Rule” (see below, after the image…).
The Rule: Start at the variable name. Read to the right until you hit a closing parenthesis ) or the end of the line. Then, read to the left. When you hit an opening parenthesis (, jump out and repeat the process for the outer context.
Let’s apply this to the examples from the slides.
int *p;
Start at p: p is…
Read left: …a pointer to…
Read left: …an int.
Result:p is a pointer to an int.
int *p[13];
Start at p: p is…
Read right: …an array of 13…
Read left: …pointers to…
Read left: …int.
Result:p is an array of 13 pointers to int.
int (*p)[13];
Start at p: p is…
The parentheses force us to read left first: …a pointer to…
Jump out of the parentheses and read right: …an array of 13…
Read left: …ints.
Result:p is a pointer to an array of 13 ints.
int (*f)();
Start at f: f is…
Read left (due to ()): …a pointer to…
Jump out and read right: …a function that takes unspecified arguments and returns…
Read left: …an int.
Result:f is a pointer to a function returning an int.
Function Pointers
Just as a variable has an address in the data or stack segment, a function has an address in the read-only code segment. A function pointer is a variable that stores the address of a function.
This allows for powerful techniques, such as passing functions as arguments to other functions (callbacks) or creating tables of functions to implement dynamic behavior.
Syntax:
// func is a pointer to a function that takes an (int *, char)// and returns an int.int (*func)(int *, char);
The parentheses around *func are crucial. Without them, int *func(...) would declare a function that returns a pointer to an int.
A Real-World Example: The Linux Kernel VFS
Function pointers are the backbone of many large C systems. The Linux kernel’s Virtual File System (VFS) uses them to provide a uniform interface for many different types of filesystems (ext4, NFS, etc.) and devices.
It defines a struct full of function pointers, sometimes called a vtable (virtual table):
struct file_operations { ssize_t (*read) (struct file *, char *, size_t, loff_t *); ssize_t (*write) (struct file *, const char *, size_t, loff_t *); int (*open) (struct inode *, struct file *); // ... and many more operations};
When you call read() on a file descriptor, the kernel looks up the corresponding file_operations struct for that file and calls the read function pointer within it. This provides a form of polymorphism: the same read() system call can invoke different underlying code depending on whether it’s operating on a regular file, a network socket, or a device.
Chapter 5: Dynamic Memory Allocation
So far, we’ve seen two ways to allocate memory:
Static Allocation: For global variables. Memory is allocated once when the program loads and persists for the entire lifetime of the program.
Automatic Allocation: For local variables. Memory is allocated on the stack when a function is called and is automatically deallocated when the function returns.
This is often insufficient. We frequently need memory that:
Persists across function calls, but not for the whole program lifetime.
Is too large to safely fit on the stack.
Has a size that is unknown at compile time.
This is where dynamic memory allocation comes in. We explicitly request blocks of memory from a large pool called the heap (or “free store”).
Manual vs. Automatic Memory Management
In languages like Java or Python, a garbage collector automatically finds and deallocates memory that is no longer in use. This is automatic memory management.
C requires manual memory management. You, the programmer, are in complete control. You must explicitly request memory, and you are responsible for explicitly releasing it when you are done. This offers maximum control and performance but requires much more care.
The C Memory API
The standard C library (<stdlib.h>) provides a simple but powerful API for managing dynamic memory.
malloc()
malloc (memory allocate) is the primary function for requesting memory.
// declared in stdlib.hvoid *malloc(size_t sz);
It takes one argument: the number of bytes you want to allocate.
It returns a void * (a generic pointer) to the first byte of the newly allocated block.
If the allocation fails (e.g., the system is out of memory), it returns NULL.
Crucially, the allocated memory is uninitialized. It contains garbage values.
Canonical Usage:
// Allocate space for an array of 10 longslong *arr = (long *)malloc(10 * sizeof(long));// ALWAYS check for failure!if (arr == NULL) { // Handle error: print message, exit, etc. return ERRCODE;}// Now you can use the memoryarr[0] = 5L;
Notice the two key patterns:
We use sizeof() to calculate the size in a portable way.
We cast the returned void * to the specific pointer type we need (long *).
calloc()
calloc (contiguous allocate) is similar to malloc but with two key differences.
// declared in stdlib.hvoid *calloc(size_t nm, size_t sz);
It takes two arguments: the number of elements (nm) and the size of each element (sz). It allocates nm * sz bytes.
It zeroes the memory. Unlike malloc, the allocated block is guaranteed to be filled with zeros. This is slightly slower but can prevent bugs from using uninitialized data.
free(): Releasing Memory
When you are finished with a block of dynamically allocated memory, you must release it back to the system using free().
// declared in stdlib.hvoid free(void *ptr);
It takes a single argument: the pointer that was returned by malloc, calloc, or realloc.
Important: You must pass the exact pointer you received. Freeing a pointer to the middle of a block is undefined behavior.
After free(ptr), the pointer ptr becomes a dangling pointer. It still holds the address, but the memory at that address is no longer valid and could be re-allocated at any moment. Accessing it is a serious bug (use-after-free).
Good Practice After free()
To prevent accidental use of a dangling pointer, it’s good practice to set the pointer to NULL immediately after freeing it.
free(arr);arr = NULL;
realloc(): Resizing a Block
realloc allows you to change the size of a previously allocated block of memory.
// declared in stdlib.hvoid *realloc(void *ptr, size_t size);
It takes the original pointer and the new desired size in bytes.
It might be able to extend or shrink the block in place, but more often, it will:
Allocate a new, larger block of memory somewhere else.
Copy the contents from the old block to the new block.
Free the old block.
You must always use the new address returned by realloc! The old pointer may now be invalid.
Canonical Usage:
long *new_arr = (long *)realloc(arr, 20 * sizeof(long));if (new_arr == NULL) { // Handle realloc failure. Note that the original 'arr' is still valid! return ERRCODE;}arr = new_arr; // Update the pointer to the new location.
A Complete Example: A Dynamic Array
This program reads an unknown number of integers from the user, storing them in an array that grows dynamically as needed.
Logic Breakdown:
Initialization: Start by allocating a small initial array using calloc.
Loop: Read numbers one by one.
Check Capacity: Before storing a new number, check if the array is full (num >= sz).
Grow: If full, double the size (sz *= 2) and use realloc to get a larger block of memory. Always check for realloc failure.
Store: Place the new number in the array and increment the count.
Cleanup: After the loop, free the final allocated block.
Perils of the Heap: Memory Corruption and Leaks
Manual memory management is powerful but fraught with danger.
Memory Corruption
This happens when you write to memory you shouldn’t. C provides no safety net.
Common corruption bugs include:
Buffer Overflow: Writing past the end of an allocated block (a[2] = 5; on a 2-element array).
Pointer Arithmetic Errors: Calculating a pointer that points outside a valid block (c = b+3;).
Freeing Invalid Pointers: Calling free() on a stack address (free(&(a[0]));) or a pointer that wasn’t returned by malloc.
Double Free: Calling free() on the same pointer twice.
Use-After-Free: Dereferencing a pointer after it has been freed (b[0] = 5;).
Memory Leaks
A memory leak occurs when you allocate memory but fail to free it when it’s no longer needed. The program “loses” the pointer to the memory, making it impossible to deallocate, but the memory remains allocated from the system’s perspective.
For long-running programs like servers, even small leaks can accumulate over time, consuming all available memory and causing the system to slow down (due to virtual memory thrashing) or crash.
User-Defined Types: struct, union, and typedef
C allows you to create your own complex data types.
struct: Composing Data
A struct is a collection of one or more variables, possibly of different types, grouped together under a single name. It’s C’s primary tool for creating structured data.
// Defines a new type called "struct Point"struct Point { int x; int y;};// Declares and initializes a variable of this type on the stackstruct Point p1 = {10, 20};
Accessing Members:
Use the dot operator (.) to access members of a struct variable directly.
p1.x = 15;
Use the arrow operator (->) to access members through a pointer to a struct.
struct Point *p_ptr = &p1;p_ptr->y = 30; // Set the y field of the struct p_ptr points to
Arrow Operator ->
The expression p_ptr->y is simply convenient syntactic sugar for (*p_ptr).y. It dereferences the pointer and then accesses the member.
Structs and Functions:
Like other variables, structs are passed by value to functions. This means the entire struct is copied. For large structs, this is inefficient. It’s much more common to pass a pointer to the struct (pass-by-reference) to avoid the copy and allow the function to modify the original.
union: Sharing Memory
A union is syntactically like a struct, but all its members share the same memory location. The size of the union is the size of its largest member. It can only hold one of its member values at any given time.
union u { int ival; float fval; char* sval;};
It is entirely the programmer’s responsibility to keep track of which type is currently stored in the union. Reading my_union.fval after you stored an int in my_union.ival results in reinterpreting the integer’s bits as a float, which is usually garbage.
typedef: Creating Type Aliases
typedef allows you to create a new name for an existing type. It’s an essential tool for improving code readability and managing complexity.
// Now, uint32_t can be used as an alias for "unsigned int"typedef unsigned int uint32_t;// This is much cleaner than writing "struct Point" everywheretypedef struct Point Point;Point p1;
Taming a Monster Declaration with typedef
Let’s revisit the complex declaration from before and see how typedef can make it understandable.
The Goal: Declare x as int (*(*x[3])())[5].
Using the Right-Left Rule, this means: x is an array of 3 pointers to functions that return a pointer to an array of 5 ints.
The typedef Buildup: We build the type from the inside out.
Start with the innermost type: An array of 5 ints.
typedef int fiveints[5]; // fiveints is now a type representing "array of 5 ints"
Next layer: A pointer to that array type.
typedef fiveints* p5i; // p5i is a "pointer to an array of 5 ints"
Next layer: A function that returns that pointer type.
typedef p5i (*f_of_p5is)(); // f_of_p5is is a "pointer to a function returning a p5i"
Final Declaration: Now, we can declare x simply as an array of 3 of these function pointers.
c f_of_p5is x[3];
This step-by-step process, giving meaningful names to intermediate types, transforms an unreadable declaration into a series of simple, understandable steps.
C Namespaces
Finally, it’s useful to know that C has several different “namespaces,” which is why you can sometimes see the same identifier used for different things without conflict.
C maintains separate namespaces for:
Labels (for goto).
Tags (struct, union, and enum names).
Member names (each struct or union has its own private namespace for its members).
Ordinary identifiers (everything else: variable names, function names, typedef names).
This is why struct id (a tag) can contain a member int id; (a member name) without conflict. While technically possible, reusing names this way is generally considered poor style.