Lecture from: 29.10.2025 | Video: Videos ETHZ

In previous lectures, a solid foundation was built for understanding how C code translates to the language of the machine. This included basic x86 instructions, registers, and how the compiler translates high-level control flow constructs like loops and conditionals into assembly.

Focus now shifts from control to data. This lecture opens the black box on C’s data structures, arrays, multi-dimensional arrays, structs, and unions, to see how they are laid out in memory and how the compiler generates code to access them efficiently. This is a critical topic, as a deep understanding of data layout separates a C programmer from a true systems programmer.

Nested (Multi-dimensional) Arrays

C does not have true multi-dimensional arrays. Instead, it has arrays of arrays.

int pgh[4][5];

This declares an array pgh with 4 elements. Each of those elements is, in turn, an array of 5 integers.

Memory Layout: Row-Major Ordering

Because the rule of contiguity applies at every level, a nested array is laid out in a single, flat, uninterrupted block of memory. C guarantees row-major ordering: the first row (pgh[0]) is stored in its entirety, followed immediately by the second row (pgh[1]), and so on.

Accessing Nested Array Elements

To access an element A[i][j], the compiler calculates its offset from the beginning of the array A.

  • Starting address of row i: A + i * C * sizeof(T) (where C is the number of columns).
  • Address of element j within that row: (start of row i) + j * sizeof(T)

This can be simplified to a single formula:

Row Access Code

A fascinating consequence of C’s type system is that one can refer to just a row of a 2D array. The expression pgh[index] returns a pointer to the first element of the index-th row. Its type is int *.

Generated Assembly for return pgh[index];

The compiler knows each row has 5 integers (a zip_dig), so each row is 20 bytes long. It cleverly computes 20 * index using two fast leal instructions instead of a slow imul. Note that this code does not access memory, it only calculates and returns an address.

Element Access Code

When accessing a specific element like pgh[index][dig], the compiler generates code to compute the full address formula and perform a single memory read.

Generated Assembly for return pgh[index][dig];

leaq    (%rdi,%rdi,4), %rsi   # rsi = 5 * index
addq    %rax, %rsi            # rsi = dig + 5 * index
movl    (%rax,%rsi,4), %eax   # eax = Mem[pgh + 4*(dig + 5*index)]

This is a direct, efficient implementation of the address calculation formula, culminating in a single movl that fetches the value.

Strange Referencing Examples & The C Standard

C’s lack of bounds checking combined with its pointer arithmetic rules can lead to some surprising (and non-portable) behaviors.

The C99 Standard

One might think pgh[4][-1] (one-past-the-end of the array, then one back) should be the same as pgh[3][4]. The C99 standard, in a famously convoluted paragraph, states that pointer comparisons are only well-defined for elements within the same array object or one element past its end. Operations that create pointers far outside an object’s bounds (even temporarily) result in undefined behavior. An optimizing compiler is legally allowed to assume such code never happens and may generate unexpected results. The lesson: one should not rely on clever out-of-bounds pointer arithmetic.

Multi-level Arrays

There is another way to represent a 2D matrix in C that looks syntactically similar but is fundamentally different in memory: an array of pointers.

Nested Array (from before): int pgh[4][5]; This is a single, contiguous block of bytes.

Multi-level Array:

#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};

This declares univ as an array of 3 pointers to integers. Each element of univ is an 8-byte address. These pointers can point to other arrays that can be located anywhere in memory; they are not guaranteed to be contiguous.

Accessing Multi-level Array Elements

Although the C access syntax univ[index][dig] looks identical to the nested array case, the generated assembly is completely different. To access an element, the processor must perform two memory reads:

  1. First Read (Get Row Pointer): It must first calculate the address of univ[index] (univ + index * 8), and load the pointer stored at that location. This pointer is the base address of the desired row array.
  2. Second Read (Get Element): It then uses this newly loaded row address to calculate the final element address (row_address + dig * 4) and performs a second memory read to get the integer value.

This is fundamentally less efficient than accessing a true nested array. However, multi-level arrays offer more flexibility, such as allowing rows of different lengths (a “jagged array”).

Dynamic Nested Arrays

If the matrix size is unknown at compile time, it must be allocated on the heap and the indexing managed manually.

// Create a new n*n matrix on the heap
int *new_var_matrix(int n) {
    return (int *)calloc(sizeof(int), n * n);
}
 
// Access element (i, j)
int var_ele(int *a, int i, int j, int n) {
    return a[i*n + j];
}

Here, the programmer must explicitly perform the row-major index calculation i*n + j. This has performance implications.

Generated Assembly for var_ele:

var_ele:
    imull   %ecx, %esi      # n*i
    addl    %edx, %esi      # n*i + j
    movslq  %esi, %rsi
    movl    (%rdi,%rsi,4), %eax # Mem[a + 4*(n*i+j)]
    ret

Notice the imull instruction. Because n is a variable, the compiler cannot use fast shifts or leal tricks; it must generate a general-purpose (and slower) integer multiplication instruction inside the access code.

Optimizing Dynamic Array Multiplication

When this kind of manual indexing is used inside a loop, like in matrix multiplication, the performance cost can be significant.

Unoptimized Code:

for (j = 0; j < n; j++) {
    result += a[i*n+j] * b[j*n+k];
}

Inside this loop, there are two multiplications for the array subscripts (i*n and j*n) and one for the data, for a total of 3 multiplies per iteration.

An optimizing compiler (-O2) can dramatically improve this using two key techniques:

  1. Code Motion: The expression i*n is loop-invariant. It can be computed once before the loop starts.
  2. Strength Reduction: The expression j*n can be replaced by an accumulator that is simply incremented by n in each iteration, turning an expensive multiplication into a cheap addition.

The compiler can transform the code to use pointer-like access patterns even for dynamically sized arrays, reducing the inner loop to just one multiply and four adds, making it much faster.

Structures (struct)

A struct is a collection of one or more variables, possibly of different types, grouped together under a single name. It is C’s primary tool for creating complex, structured data.

  • It defines a contiguously-allocated region of memory.
  • Members are stored in the order they are declared.
  • Members are referenced by name.

Accessing Structure Members

The compiler knows the offset of each member from the beginning of the struct at compile time.

  • Direct Access (.): r.i
  • Pointer Access (->): r->i (syntactic sugar for (*r).i)

Generated Assembly:

  • r->i = val; (where r is in %rdi, val in %esi)
    movl %esi, (%rdi)  // Move val to memory at address r + 0
  • return &r->a[idx];
    leaq 4(%rdi,%rsi,4), %rax // return r + 4 + idx*4
    The compiler hardcodes the offsets (0 for i, 4 for a) directly into the instructions. Accessing struct members is extremely efficient.

Alignment

While memory is often thought of as a flat array of bytes, modern hardware accesses it in larger chunks (e.g., 8-byte words). For performance, the hardware requires or strongly prefers that a K-byte primitive data type be stored at an address that is a multiple of K. This is called alignment.

  • Required on some machines: Early ARM or Alpha processors would fault on an unaligned access.
  • Advised on x86: x86 will handle unaligned accesses, but they are much slower. An unaligned access might cross a cache line or virtual memory page boundary, requiring two memory operations instead of one.

To enforce this, the compiler will insert gaps (padding) into structures.

Alignment Rules in Structures

  1. Within a structure: Each member must be aligned according to its own type.
  2. Overall structure: The entire structure has an alignment requirement K, which is the largest alignment requirement of any of its members. The start address of the struct and its total size must be a multiple of K.

Example:

struct S1 {
    char c;     // 1 byte
    int i[2];   // 4-byte alignment
    double v;   // 8-byte alignment
} *p;
  • The largest alignment is for double v, so K = 8.
  • The entire struct S1 must be aligned on an 8-byte boundary.

Memory Layout with Padding:

  • c is at offset 0.
  • i[0] needs 4-byte alignment, so the compiler inserts 3 bytes of padding after c. i[0] starts at offset 4.
  • i takes 8 bytes, ending at offset 11.
  • v needs 8-byte alignment. The next multiple of 8 is 16. The compiler inserts 4 bytes of padding after i[1]. v starts at offset 16.
  • v is 8 bytes long. The total data size is bytes. The total consumed space is 24 bytes.
  • The final size of the struct must be a multiple of K=8, so sizeof(struct S1) is 24 bytes.

Practice: Struct Alignment and Padding

Calculating the size of a structure is a fundamental systems skill. It requires applying alignment rules to each field sequentially.

Exercise: Calculate Struct Size

Calculate sizeof(struct node) on a 64-bit machine.

struct node {
    char c;      // 1 byte
    short s;     // 2 bytes
    long l;      // 8 bytes
    int i;       // 4 bytes
};

Solution:

  1. c: Offset 0. (Needs 1-byte alignment).
  2. s: Needs 2-byte alignment. Offset 1 is invalid. Insert 1 byte padding. s is at Offset 2.
  3. l: Needs 8-byte alignment. Offset 4 () is invalid. Insert 4 bytes padding. l is at Offset 8.
  4. i: Needs 4-byte alignment. Multiples of 4 are fine after l (which ends at 16). i is at Offset 16.
  5. Final Step: The struct ends at offset 20. The largest alignment requirement is 8 bytes (from long l). The total size must be a multiple of 8.
  6. Result: Add 4 bytes trailing padding to reach 24. sizeof(struct node) = 24.

Exercise: Reordering for Space

Reorder the fields in struct node to minimize padding. Answer: Put the 8-byte long l first, then the 4-byte int i, then the 2-byte short s, then the 1-byte char c.

  • l (offset 0), i (offset 8), s (offset 12), c (offset 14).
  • Total size ends at 15. Round up to multiple of 8 16 bytes.
  • Saving: 8 bytes.

Unions

A union is syntactically like a struct, but all its members share the same memory location. It is allocated according to the size of its largest element. It can only hold a value for one field at a time.

union U1 {
    char c;
    int i[2];
    double v; // Largest element, 8 bytes
}; // sizeof(union U1) will be 8.

Unions are a way to circumvent C’s type system. It is the programmer’s responsibility to track which type is currently stored. A common use is to access the raw bit patterns of a value.

union bit_float {
    float f;
    unsigned int u;
};
 
// This function reinterprets the bits of a float as an int.
// This is NOT the same as a type cast!
unsigned int float2bit(float val) {
    union bit_float arg;
    arg.f = val;
    return arg.u;
}

Summary

  • Arrays: Contiguous, aligned, no bounds checking. The name decays to a pointer.
  • Structures: Members allocated in order declared, with padding added to satisfy alignment.
  • Unions: Members are overlaid in the same memory location, providing a way to circumvent the type system.

Continue here: 14 Non-Local Jumps and Coroutines