Lecture from: 22.10.2025 | Video: Videos ETHZ

Previous lectures introduced the core concepts of an Instruction Set Architecture (ISA), the distinction between CISC and RISC, and the basics of the x86 instruction set, including registers and the mov instruction. This lecture explores how the machine effectively executes code by examining arithmetic operations, condition codes, and the fundamental mechanism for procedure calls.

x86 Integer Arithmetic

Two-Operand Instructions

Unlike RISC architectures (like MIPS) which typically use three-operand instructions (dest = src1 op src2), x86 is a two-operand architecture. The destination register also serves as one of the source operands.

The general form is: op src, dest, which computes dest = dest op src.

Here are some of the most common arithmetic and logical instructions (shown with the l suffix for 32-bit “long word” operations):

  • addl src, dest: dest = dest + src
  • subl src, dest: dest = dest - src
  • imull src, dest: dest = dest * src (Signed Multiply)
  • sall src, dest: dest = dest << src (Shift Arithmetic Left)
  • sarl src, dest: dest = dest >> src (Shift Arithmetic Right)
  • shrl src, dest: dest = dest >> src (Shift Logical Right)
  • xorl src, dest: dest = dest ^ src (Bitwise XOR)
  • andl src, dest: dest = dest & src (Bitwise AND)
  • orl src, dest: dest = dest | src (Bitwise OR)

Why is there no distinction between signed and unsigned addl?

Because of the magic of two’s complement representation! The bit-level operation for adding two’s complement numbers is identical to adding unsigned binary numbers. The hardware is the same; only the interpretation of the result and the overflow flags differs.

One-Operand Instructions

Some operations only require a single operand, which acts as both source and destination.

  • incl dest: dest = dest + 1
  • decl dest: dest = dest - 1
  • negl dest: dest = -dest (Arithmetic negation)
  • notl dest: dest = ~dest (Bitwise NOT)

Using leal for Arithmetic Expressions

As seen previously, the leal (Load Effective Address) instruction is helpful for more than just calculating addresses. Since its address calculation hardware can perform a shift, an addition, and another addition all in one operation, compilers frequently use it as a fast way to do simple arithmetic.

Consider this C function and its compiled assembly:

The logic simplifies as follows:

  • int t4 = y * 48; The compiler knows that . It implements this multiplication not with a slow imull instruction, but with a fast leal and a sall (shift):

    leal (%rsi,%rsi,2), %edx  # edx = y + y*2  (i.e., edx = y * 3)
    sall $4, %edx             # edx = edx << 4 (i.e., edx = edx * 16)

    In just two fast instructions, it has calculated y * 48.

  • int t5 = t3 + t4; where t3 = x+4; The compiler combines these into a single leal:

    leal 4(%rdi,%rdx), %ecx   # ecx = 4 + x + edx (which holds t4)

This demonstrates how compilers can be incredibly creative in mapping C arithmetic onto the available machine instructions for maximum performance.

A Note on Compiling

The assembly code generated by a compiler is not static. It can change dramatically based on:

  • Compiler Version: Newer compilers have better optimization heuristics.
  • Microarchitectural Changes: Compilers are tuned for specific processor models. An instruction sequence that is fast on one CPU might be slow on another.
  • Optimization Flags: The code generated with -O0 (no optimization) will look very different from code generated with -O2 (full optimization).

The examples here are real, but results may vary. If performance matters, test and look at the assembly! Don’t make assumptions.

Condition Codes

Processors implement conditional logic like if (x > y) using a set of special one-bit registers called condition codes (or flags).

These flags are stored in the status register (%rsr) and are implicitly set as a side effect of most arithmetic and logical operations (but not leal).

The four most important condition codes are:

  • CF (Carry Flag): Set if an unsigned addition resulted in a carry-out from the most significant bit. Indicates unsigned overflow.
  • ZF (Zero Flag): Set if the result of the operation was zero.
  • SF (Sign Flag): Set if the result of the operation is negative (i.e., a copy of the most significant bit of the result).
  • OF (Overflow Flag): Set if a signed (two’s complement) operation resulted in overflow. This happens if (a>0 && b>0 && result<0) or (a<0 && b<0 && result>=0).

Explicitly Setting Condition Codes

Sometimes it is necessary to set the flags without performing an arithmetic operation. cmp and test are used for this.

  • cmp S2, S1 (Compare): Behaves like S1 - S2, but discards the result. It only sets the condition codes. This is the primary way to implement comparisons like a == b or a > b.
  • test S2, S1 (Test): Behaves like S1 & S2, but discards the result. It is used for bitwise tests, often with a mask. For example, testl %eax, %eax will set the ZF if %eax is zero and the SF if %eax is negative.

Reading Condition Codes

Once the flags are set, there are two main ways to use them:

  1. setX Instructions: These instructions set a single destination byte to 0 or 1 based on a combination of the condition codes. The X is a mnemonic for the condition being tested.

    For example, sete (set if equal) sets the destination byte to 1 if ZF is 1. setg (set if greater, for signed) sets the byte to 1 if ~(SF^OF)&~ZF is true.

  2. jX Instructions (Jumps): These are conditional jumps. The program counter (%rip) is changed to a new address only if the condition X is met. This is the fundamental building block for all control flow.

    For example, je (jump if equal) jumps if ZF is 1. jg (jump if greater) jumps if the same complex condition as setg is met.


With the building blocks of assembly, arithmetic, data movement, and conditional jumps established, exploration can broaden to how a C compiler translates high-level control flow structures like if-else, while, for, and switch into machine code.

if-then-else Statements

The most fundamental control flow construct is the if-then-else statement. A compiler translates this into a sequence of tests and jumps.

C Code:

#include <stdio.h>
 
int putmax(int x, int y) {
    int result;
    if (x > y) {
        result = printf("%d\n", x);
    } else {
        result = printf("%d\n", y);
    }
    return result;
}

To understand the assembly, it is helpful to first think about how to represent this logic using only goto statements, as this is much closer to how the processor operates.

goto Version:

// This is how a compiler might reason about the C code.
int putmax_goto(int x, int y) {
    int result;
    if (x <= y) { // Note the inverted condition!
        goto Else;
    }
    // This is the "Then" block
    result = printf("%d\n", x);
    goto Done;
Else:
    // This is the "Else" block
    result = printf("%d\n", y);
Done:
    return result;
}

Practice: Conditional Branching

Understanding how the processor decides whether to jump is central to reverse engineering.

Exercise: Jump Logic

Consider the following assembly:

cmpl %esi, %edi
jg .L5
  1. Under what C condition is the jump taken?
    • Answer: x > y. cmpl computes dest - src (which is edi - esi or x - y). jg is “Jump Greater”.
  2. What flags are checked by jg?
    • Answer: It checks that results are not zero (ZF=0) and that the sign matches the overflow (SF == OF).

Exercise: C to Goto Translation

Translate the following C loop into the “goto” style the compiler uses:

while (a < b) {
    a *= 2;
}

Answer:

    goto test;
loop:
    a *= 2;
test:
    if (a < b) goto loop;

This goto structure maps almost one-to-one with the generated assembly code.

Generated x86 Assembly:

putmax:
    ; --- Setup ---
    subq    $8, %rsp          // Allocate space on the stack
    
    ; --- Test ---
    cmpl    %esi, %edi        // Compare y with x (computes x - y)
    jle     .L2               // Jump if Less or Equal to label .L2 (the Else block)
 
    ; --- Body 1 (Then block) ---
    movl    %edi, %edx        // Prepare arguments for printf(x)
    movl    $.LCO, %esi
    ...
    call    __printf_chk
    jmp     .L3               // Unconditionally jump to the end (Done)
 
.L2: ; --- Body 2 (Else block) ---
    movl    %esi, %edx        // Prepare arguments for printf(y)
    ...
    call    __printf_chk
 
.L3: ; --- Finish ---
    addq    $8, %rsp          // Deallocate stack space
    ret                       // Return

The logic is clear:

  1. Test: The cmpl instruction sets the condition codes based on the result of x - y.
  2. Conditional Jump: The jle (Jump if Less or Equal) instruction checks the flags. If x <= y, it jumps directly to the code for the else block at label .L2.
  3. Then Block: If the jump is not taken, it means x > y, and execution falls through to the code for the then block.
  4. Unconditional Jump: After the then block finishes, an unconditional jmp skips over the else block to the end.

The Conditional Move (cmov) Optimization

For simple conditional assignments without side effects (like function calls), compilers can use a more efficient strategy that avoids jumps entirely: the conditional move instruction.

C Code:

int absdiff(int x, int y) {
    int result;
    if (x > y) {
        result = x - y;
    } else {
        result = y - x;
    }
    return result;
}

Assembly with cmov:

absdiff:
    movl    %edi, %eax     // eax = x (this will be the final result)
    subl    %esi, %eax     // eax = x - y (the "then" expression)
    movl    %esi, %edx     // edx = y
    subl    %edi, %edx     // edx = y - x (the "else" expression)
    cmpl    %esi, %edi     // Compare y with x
    cmovle  %edx, %eax     // Conditional Move: if (x <= y), then move edx into eax
    ret                    // Return the value in %eax

How it works:

  1. The code speculatively computes both the “then” result (x-y) and the “else” result (y-x), storing them in separate registers.
  2. It performs the comparison to set the condition codes.
  3. The cmovle (Conditional Move if Less or Equal) instruction acts as a guarded update: it copies the “else” result into the destination register only if the condition is met. Otherwise, it does nothing.
  4. The final result is now correctly in %eax regardless of which path was logically taken.
  • Pro: Avoids conditional branches. On modern pipelined processors, a mispredicted branch can be very expensive, so a straight line of code is often faster.
  • Con: It has the overhead of evaluating both branches. This optimization should not be used when the expressions are expensive or have side effects (e.g., printf).

Compiling Loops

All C loops (do-while, while, for) are ultimately compiled down to assembly using conditional jumps. The compiler often internally transforms while and for loops into a do-while structure, which is the most natural fit for assembly’s backward jumps.

do-while Loop

This is the simplest loop to compile because the condition is tested at the end.

C Code:

int fact_do(int x) {
    int result = 1;
    do {
        result *= x;
        x = x-1;
    } while (x > 1);
    return result;
}

Assembly Translation:

fact_do:
    movl    $1, %eax    // result = 1
.L2:                    // Start of loop body
    imull   %edi, %eax  // result *= x
    subl    $1, %edi    // x = x-1
    cmpl    $1, %edi    // Compare x with 1
    jg      .L2         // Jump if Greater back to the start of the loop
    ret

This is a direct translation: the loop body is followed by a test and a backward conditional jump.

while Loop

A while loop tests the condition before the first iteration. A common and efficient compiler strategy is the “jump-to-middle” transformation.

C Code:

while (<test>) {
    <body>
}

goto Version (Compiler’s thought process):

    goto Middle;
Loop:
    <body>
Middle:
    if (<test>)
        goto Loop;

Assembly Translation:

    jmp     .L34        // Unconditional jump to the test
.L35:                   // Top of the loop body
    imull   %edx, %eax  // result *= x
    decl    %edx        // x--
.L34:                   // The "middle": where the test happens
    cmpl    $1, %edx    // Compare x with 1
    jg      .L35        // If greater, jump back to the loop body

This structure ensures the test is only written once in the assembly code while still correctly checking the condition before the first iteration.

for Loop

A for loop is simply syntactic sugar for a while loop.

for (<init>; <test>; <update>) { <body> }

The compiler conceptually transforms this into:

<init>;
while (<test>) {
    <body>
    <update>;
}

This while loop is then compiled using the jump-to-middle technique seen above. The <init> part is executed once before the loop begins, and the <update> part becomes the last statement inside the loop body before the next test.

switch Statements

switch statements provide multi-way branching. Compilers use two main strategies to implement them, depending on the case values.

Compact switch and Jump Tables

If the case labels are a dense, small range of integers (e.g., 0, 1, 2, 3, 5, 6), the compiler can use a highly efficient technique called a jump table.

A jump table is an array of code addresses. The value of the switch expression is used as an index into this array to find the address of the code for the corresponding case, and the program performs an indirect jump to that address.

Example Walkthrough:

  1. Setup and Bounds Check: The compiler first checks if the switch variable (x, in %rdi) is within the range of the jump table (e.g., 0 to 6). If it is out of bounds (ja .L8), it jumps to the default case.

  2. The Indirect Jump: jmp *.L4(,%rdi,8)

    • This is the core of the jump table mechanism. It is an indirect jump that calculates a memory address and jumps to the location stored at that address.
    • .L4: The base address of the jump table array.
    • %rdi: The index (x).
    • 8: The scale factor, since each address in the table is an 8-byte pointer on x86-64.
    • The instruction computes Address = .L4 + x * 8, fetches the 8-byte code address stored there, and jumps to it.
  3. The Jump Table in Memory: The jump table itself is stored in the read-only data section (.rodata) of the program. It is just an array of addresses (labels).

  4. Code Blocks: Each label in the jump table corresponds to the start of a small chunk of assembly code that implements the body of that case. Fall-through is implemented by simply having one code block end by jumping or falling into the next.

Sparse switch and Decision Trees

If the case labels are sparse and widely distributed (e.g., 0, 111, 222, … 999), a jump table would be enormous and mostly empty, a huge waste of space.

In this scenario, the compiler could generate a simple chain of if-else if statements. However, a more clever compiler will organize the comparisons into a balanced binary search tree to achieve logarithmic performance.

Assembly Translation:

The assembly code implements this tree:

  1. It first compares the input x to a value in the middle of the range (e.g., 444).
  2. Based on the result (<, =, >), it jumps to a different part of the code.
  3. Each of those parts then compares x to the middle of its new, smaller sub-range.

This process continues until the correct case is found. For N cases, this takes roughly comparisons, which is much better than the of a simple linear search.

Procedure Call and Return

The final piece of control flow is calling and returning from functions. This is managed by the stack and a set of special instructions.

The x86-64 Stack

  • It is a region of memory that grows toward lower addresses.
  • The %rsp register is the stack pointer; it always holds the address of the “top” of the stack (the lowest address currently in use).

  • pushq src: Pushes an 8-byte value onto the stack.
    1. Decrements %rsp by 8.
    2. Writes the value from src to the memory location now pointed to by %rsp.
  • popq dest: Pops an 8-byte value from the stack.
    1. Reads the value from the memory location pointed to by %rsp.
    2. Stores that value in dest.
    3. Increments %rsp by 8.

call and ret Instructions

These instructions automate the process of saving the return location.

  • call Label:

    1. Pushes the return address onto the stack. The return address is the address of the instruction immediately following the call.
    2. Jumps (sets %rip) to Label.
  • ret:

    1. Pops the return address from the top of the stack.
    2. Jumps (sets %rip) to that popped address.

This elegant mechanism allows for nested and recursive function calls. Each call saves its return point, and each ret brings execution back to where it left off, unwinding the call chain one step at a time.


Continue here: 12 Procedures and the Stack Frame