Lecture from: 28.10.2025 | Video: Videos ETHZ

Procedure Control Flow

Previous discussions covered how to implement loops and conditionals using jumps. The final, and most complex, piece of control flow is the procedure (or function) call and return. This mechanism is fundamental to structured programming and relies heavily on the stack.

The x86-64 Stack Revisited

The stack operation in x86-64 can be formalized as follows.

  • Region of Memory: The stack is a contiguous region of memory managed with a LIFO (Last-In, First-Out) discipline.
  • Growth Direction: It grows toward lower memory addresses.
  • Stack Pointer (%rsp): This special-purpose register always holds the address of the “top” element of the stack, the last item that was pushed.

push and pop Instructions

While a RISC processor manipulates the stack pointer with standard arithmetic instructions, x86 provides dedicated push and pop instructions.

  • pushq Src: Pushes an 8-byte value onto the stack.

    1. Decrements %rsp by 8.
    2. Writes the operand at the address in %rsp.
  • popq Dest: Pops an 8-byte value from the stack.

    1. Reads the value at %rsp and writes it to Dest.
    2. Increments %rsp by 8.

The call and ret Instructions

Procedure calls are so fundamental that x86 provides dedicated instructions for them, which implicitly use the stack.

  • call Label:
    1. Push the return address (address of the next instruction) onto the stack.
    2. Jump to Label.
  • ret:
    1. Pop the return address from the stack.
    2. Jump to that address.

CISC vs. RISC Procedure Calls

This call/ret mechanism is a hallmark of CISC architectures. On a RISC processor like MIPS or RISC-V, this is typically handled by a “jump-and-link” instruction, which saves the return address in a dedicated link register instead of the stack. It is then the software’s responsibility (the compiler’s) to save the link register’s value to the stack if the called function itself needs to make further calls.

A Procedure Call Example in Detail

Tracing the execution of a call instruction illuminates the process.

State Before call:

  • The instruction pointer (%rip) is 0x804854e, the address of the call instruction.
  • The stack pointer (%rsp) is 0x108.

Executing call 8048b90:

  1. Push Return Address: The processor determines the address of the next instruction, which is 0x8048553. It pushes this 8-byte value onto the stack.
    • %rsp is decremented by 8, becoming 0x100.
    • The value 0x8048553 is written to memory at address 0x100.
  2. Jump to Target: The instruction pointer (%rip) is updated to the target of the call, 0x8048b90.

Execution now continues inside the called procedure (main in this example).

A Procedure Return Example in Detail

Tracing the ret instruction at the end of the called procedure clarifies the return process.

State Before ret:

  • %rip is 0x8048591, the address of the ret instruction.
  • %rsp is 0x100, pointing to the return address saved earlier.

Executing ret:

  1. Pop Return Address: The processor reads the 8-byte value from the address in %rsp (0x8048553) and loads it into the instruction pointer (%rip).
  2. Increment Stack Pointer: The processor increments %rsp by 8, moving it back to 0x108. The stack is now in the same state it was in before the call.

Execution has now seamlessly returned to the instruction following the original call.

The x86-64 Stack Frame

This simple call/ret mechanism is the foundation for managing procedure calls. In practice, functions need more than just a return address; they need space for local variables, saved registers, and arguments for functions they are about to call. All of this information for a single function activation is stored in a stack frame.

A typical stack frame is organized as follows (from high to low addresses):

  • Argument Build Area: Space used by the current function to prepare arguments for functions it is about to call.
  • Local Variables & Saved Registers: Space for local variables that do not fit in registers, and for saving the values of any callee-saved registers that the function uses.
  • Old Frame Pointer: The saved value of the caller’s frame pointer (%rbp).
  • Return Address: Pushed automatically by the call instruction.
  • Arguments: Arguments passed from the caller to the current function (if there are more than six).

Register Saving Conventions

When yoo() calls who(), responsibility for ensuring registers are not overwritten varies. This is governed by the calling conventions. Registers are divided into two categories:

  • Caller-Saved Registers: If the caller (yoo) wants to preserve the value in one of these registers across a function call, it is the caller’s responsibility to save it (usually by pushing it onto its own stack frame) before the call and restore it after. The callee (who) is free to use these registers for any purpose without saving them.
  • Callee-Saved Registers: If the callee (who) wants to use one of these registers, it must first save the register’s original value on its stack frame. Before returning, it must restore the register to its original value. This guarantees to the caller that these registers will be unchanged after the function call.

x86-64 Linux Register Conventions:

  • %rax: Return value (caller-saved).
  • %rdi, %rsi, %rdx, %rcx, %r8, %r9: Arguments 1-6 (caller-saved).
  • %r10, %r11: Temporaries (caller-saved).
  • %rbx, %rbp, %r12, %r13, %r14, %r15: Callee-saved.
  • %rsp: Stack pointer (special).

A Stack Frame Example

Consider the assembly for a function that calls another function, requiring it to save registers.

/* Swap a[i] & a[i+1] */
void swap_ele_su(long a[], int i) {
    swap(&a[i], &a[i+1]);
    sum += a[i];
}

This function is a “non-leaf” procedure because it calls swap. It also needs to use the values of a and i after the call to swap returns. The arguments a and i are passed in %rdi and %esi. However, the call to swap will overwrite these registers with its arguments. Therefore, swap_ele_su must save the original values of a and i before calling swap.

Generated Assembly:

swap_ele_su:
    movq    %rbx, -16(%rsp)     # Save callee-saved register %rbx
    movslq  %esi, %rbx          # Extend & save i in %rbx
    movq    %r12, -8(%rsp)      # Save callee-saved register %r12
    movq    %rdi, %r12          # Save a in %r12
    subq    $16, %rsp           # Allocate stack frame (late allocation)
    ...
    call    swap
    ...
    movq    (%rsp), %rbx        # Restore %rbx
    movq    8(%rsp), %r12       # Restore %r12
    addq    $16, %rsp           # Deallocate stack frame
    ret

The compiler chooses to save a and i into the callee-saved registers %r12 and %rbx. However, because it is using these callee-saved registers, it must first save their original values on the stack and restore them before returning. This is the calling convention in action.

Stack Frame Optimizations

Modern compilers are extremely clever and will often omit parts of the full stack frame if they are not needed.

  • Red Zone: The 128 bytes of memory below the current stack pointer are considered a “red zone.” A leaf function (one that calls no other functions) can use this space for temporary local variables without ever moving the stack pointer. This is safe because no function calls will occur to overwrite it.
  • No Stack Frame: If a function can keep all of its local variables in registers and it is a leaf function, it may not need a stack frame at all (beyond the return address).
  • Tail Call Optimization: If the very last action of a function is to call another function (a “tail call”), the compiler can replace the call with a jmp. This avoids creating a new stack frame. The called function will return directly to the original caller. This is essential for efficient recursion in functional languages and is a powerful optimization in C as well.

The discussion now turns its attention from control flow to data. How are C’s data structures, arrays, structs, and unions, represented in memory, and how does the compiler generate code to access them?

Recap: Basic Data Types

At the machine level, there are only a few basic data types:

  • Integral Types: 1, 2, 4, or 8-byte integers. These are stored and operated on in the general-purpose integer registers.
  • Floating-Point Types: 4, 8, or 10-byte floating-point numbers. These use a separate set of floating-point registers.

All of C’s complex data structures are built from these simple primitives.

One-Dimensional Arrays

Basic Principle: Contiguous Allocation

The fundamental rule of C arrays is simple:

An array T A[L] is a contiguously allocated region of memory of size L * sizeof(T) bytes.

This means the elements are packed together in memory one after another, with no gaps.

Array Access: Pointer Arithmetic in Disguise

As previously noted, the name of an array A in most expressions “decays” to a pointer to its first element, &A[0]. The C compiler translates array subscripting A[i] into pointer arithmetic.

A[i] ≡ *(A + i)

The address of element i is calculated as:

Array Access Example in Assembly

The compiler translates a simple array access as follows.

C Code:

// zip_dig is a typedef for int[5]
int get_digit(zip_dig z, int dig) {
    return z[dig];
}

Generated x86-64 Assembly:

get_digit:
    movslq  %esi, %rsi        # Sign-extend index 'dig' from 32 to 64 bits
    movl    (%rdi,%rsi,4), %eax # The core access
    ret

The workhorse here is the movl instruction using a complex addressing mode:

  • (%rdi, %rsi, 4): This is the general (Base, Index, Scale) form.
  • %rdi: The base register, holding the starting address of the array z.
  • %rsi: The index register, holding the value dig.
  • 4: The scale factor, which is sizeof(int).

The processor’s hardware computes the address (%rdi + %rsi * 4) and fetches the 4-byte integer from that location, all in a single instruction. This is why the scaled addressing mode exists!

C Does Not Do Bounds Checking

A critical “feature” of C is that it performs no bounds checking on array accesses. If one writes a[5] for a 5-element array (with valid indices 0-4), memory just past the end of the array is accessed.

  • mit[5]: This accesses the memory location immediately following the mit array. In this specific example where the arrays are contiguous, it happens to read the first element of the ucb array. This is not guaranteed.
  • mit[-1]: This accesses the memory location before the mit array. In this example, it happens to read the last element of the cmu array. This is not guaranteed.

This lack of bounds checking is a major source of bugs and security vulnerabilities (buffer overflows) in C, but it is also a source of its performance. The language trusts the programmer to be correct.

Array Loop Optimizations

Compilers are very good at optimizing loops that iterate over arrays.

Original C Code:

int zd2int(zip_dig z) {
    int i;
    int zi = 0;
    for (i = 0; i < 5; i++) {
        zi = 10 * zi + z[i];
    }
    return zi;
}

An optimizing compiler will transform this code before generating assembly:

  1. Eliminate Loop Variable i: The index i is redundant. The compiler can use a pointer that walks through the array directly.
  2. Convert to Pointer Code: It creates a pointer z that is incremented on each iteration and a pointer zend that marks the end of the array.
  3. Convert to do-while form: This allows for a more efficient single conditional jump at the end of the loop.

Transformed C Code (Compiler’s View):

int zd2int(zip_dig z) {
    int zi = 0;
    int *zend = z + 4; // Pointer to the last element
    do {
        zi = 10 * zi + *z;
        z++;
    } while (z <= zend);
    return zi;
}

This transformed version, which avoids the repeated z[i] address calculation in every iteration, is much more efficient and maps directly to highly optimized assembly code using leal and other fast instructions.

Practice: Stack and Frame Mechanics

Visualizing the stack is critical for debugging assembly.

Exercise: Stack Trace

A 64-bit program is at address 0x400. The stack pointer %rsp is 0x800. The instruction at 0x400 is call 0x500. The first instruction of the function at 0x500 is pushq %rbx.

  1. What is the value of %rsp after the call?
    • Answer: 0x7f8 ().
  2. What value is stored at memory address 0x7f8?
    • Answer: 0x405 (the 5-byte call instruction means the next instruction is at 0x405).
  3. What is the value of %rsp after the pushq %rbx?
    • Answer: 0x7f0 ().

Exercise: Register Hygiene

Question: If function A uses %r12 and then calls function B, which function is responsible for saving %r12? Answer: Function B (the callee). %r12 is a callee-saved register. If B wants to use it, it must push it onto the stack and pop it back before returning.


Continue here: 13 Assembly Layout of C Structures and Unions