Lecture from: 22.10.2025 | Video: Videos ETHZ
Previous lectures introduced the core concepts of an Instruction Set Architecture (ISA), the distinction between CISC and RISC, and the basics of the x86 instruction set, including registers and the mov instruction. This lecture explores how the machine effectively executes code by examining arithmetic operations, condition codes, and the fundamental mechanism for procedure calls.
x86 Integer Arithmetic
Two-Operand Instructions
Unlike RISC architectures (like MIPS) which typically use three-operand instructions (dest = src1 op src2), x86 is a two-operand architecture. The destination register also serves as one of the source operands.
The general form is: op src, dest, which computes dest = dest op src.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112150114.png)
Here are some of the most common arithmetic and logical instructions (shown with the l suffix for 32-bit “long word” operations):
addl src, dest:dest = dest + srcsubl src, dest:dest = dest - srcimull src, dest:dest = dest * src(Signed Multiply)sall src, dest:dest = dest << src(Shift Arithmetic Left)sarl src, dest:dest = dest >> src(Shift Arithmetic Right)shrl src, dest:dest = dest >> src(Shift Logical Right)xorl src, dest:dest = dest ^ src(Bitwise XOR)andl src, dest:dest = dest & src(Bitwise AND)orl src, dest:dest = dest | src(Bitwise OR)
Why is there no distinction between signed and unsigned
addl?Because of the magic of two’s complement representation! The bit-level operation for adding two’s complement numbers is identical to adding unsigned binary numbers. The hardware is the same; only the interpretation of the result and the overflow flags differs.
One-Operand Instructions
Some operations only require a single operand, which acts as both source and destination.
incl dest:dest = dest + 1decl dest:dest = dest - 1negl dest:dest = -dest(Arithmetic negation)notl dest:dest = ~dest(Bitwise NOT)
Using leal for Arithmetic Expressions
As seen previously, the leal (Load Effective Address) instruction is helpful for more than just calculating addresses. Since its address calculation hardware can perform a shift, an addition, and another addition all in one operation, compilers frequently use it as a fast way to do simple arithmetic.
Consider this C function and its compiled assembly:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112150134.png)
The logic simplifies as follows:
-
int t4 = y * 48;The compiler knows that . It implements this multiplication not with a slowimullinstruction, but with a fastlealand asall(shift):leal (%rsi,%rsi,2), %edx # edx = y + y*2 (i.e., edx = y * 3) sall $4, %edx # edx = edx << 4 (i.e., edx = edx * 16)In just two fast instructions, it has calculated
y * 48. -
int t5 = t3 + t4;wheret3 = x+4;The compiler combines these into a singleleal:leal 4(%rdi,%rdx), %ecx # ecx = 4 + x + edx (which holds t4)
This demonstrates how compilers can be incredibly creative in mapping C arithmetic onto the available machine instructions for maximum performance.
A Note on Compiling
The assembly code generated by a compiler is not static. It can change dramatically based on:
- Compiler Version: Newer compilers have better optimization heuristics.
- Microarchitectural Changes: Compilers are tuned for specific processor models. An instruction sequence that is fast on one CPU might be slow on another.
- Optimization Flags: The code generated with
-O0(no optimization) will look very different from code generated with-O2(full optimization).The examples here are real, but results may vary. If performance matters, test and look at the assembly! Don’t make assumptions.
Condition Codes
Processors implement conditional logic like if (x > y) using a set of special one-bit registers called condition codes (or flags).
These flags are stored in the status register (%rsr) and are implicitly set as a side effect of most arithmetic and logical operations (but not leal).
The four most important condition codes are:
CF(Carry Flag): Set if an unsigned addition resulted in a carry-out from the most significant bit. Indicates unsigned overflow.ZF(Zero Flag): Set if the result of the operation was zero.SF(Sign Flag): Set if the result of the operation is negative (i.e., a copy of the most significant bit of the result).OF(Overflow Flag): Set if a signed (two’s complement) operation resulted in overflow. This happens if(a>0 && b>0 && result<0)or(a<0 && b<0 && result>=0).
Explicitly Setting Condition Codes
Sometimes it is necessary to set the flags without performing an arithmetic operation. cmp and test are used for this.
cmp S2, S1(Compare): Behaves likeS1 - S2, but discards the result. It only sets the condition codes. This is the primary way to implement comparisons likea == bora > b.test S2, S1(Test): Behaves likeS1 & S2, but discards the result. It is used for bitwise tests, often with a mask. For example,testl %eax, %eaxwill set theZFif%eaxis zero and theSFif%eaxis negative.
Reading Condition Codes
Once the flags are set, there are two main ways to use them:
-
setXInstructions: These instructions set a single destination byte to0or1based on a combination of the condition codes. TheXis a mnemonic for the condition being tested./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112150205.png)
For example,
sete(set if equal) sets the destination byte to 1 ifZFis 1.setg(set if greater, for signed) sets the byte to 1 if~(SF^OF)&~ZFis true. -
jXInstructions (Jumps): These are conditional jumps. The program counter (%rip) is changed to a new address only if the conditionXis met. This is the fundamental building block for all control flow./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112150219.png)
For example,
je(jump if equal) jumps ifZFis 1.jg(jump if greater) jumps if the same complex condition assetgis met.
With the building blocks of assembly, arithmetic, data movement, and conditional jumps established, exploration can broaden to how a C compiler translates high-level control flow structures like if-else, while, for, and switch into machine code.
if-then-else Statements
The most fundamental control flow construct is the if-then-else statement. A compiler translates this into a sequence of tests and jumps.
C Code:
#include <stdio.h>
int putmax(int x, int y) {
int result;
if (x > y) {
result = printf("%d\n", x);
} else {
result = printf("%d\n", y);
}
return result;
}To understand the assembly, it is helpful to first think about how to represent this logic using only goto statements, as this is much closer to how the processor operates.
goto Version:
// This is how a compiler might reason about the C code.
int putmax_goto(int x, int y) {
int result;
if (x <= y) { // Note the inverted condition!
goto Else;
}
// This is the "Then" block
result = printf("%d\n", x);
goto Done;
Else:
// This is the "Else" block
result = printf("%d\n", y);
Done:
return result;
}Practice: Conditional Branching
Understanding how the processor decides whether to jump is central to reverse engineering.
Exercise: Jump Logic
Consider the following assembly:
cmpl %esi, %edi
jg .L5- Under what C condition is the jump taken?
- Answer:
x > y.cmplcomputesdest - src(which isedi - esiorx - y).jgis “Jump Greater”.
- Answer:
- What flags are checked by
jg?- Answer: It checks that results are not zero (
ZF=0) and that the sign matches the overflow (SF == OF).
- Answer: It checks that results are not zero (
Exercise: C to Goto Translation
Translate the following C loop into the “goto” style the compiler uses:
while (a < b) {
a *= 2;
}Answer:
goto test;
loop:
a *= 2;
test:
if (a < b) goto loop;This goto structure maps almost one-to-one with the generated assembly code.
Generated x86 Assembly:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151055.png)
putmax:
; --- Setup ---
subq $8, %rsp // Allocate space on the stack
; --- Test ---
cmpl %esi, %edi // Compare y with x (computes x - y)
jle .L2 // Jump if Less or Equal to label .L2 (the Else block)
; --- Body 1 (Then block) ---
movl %edi, %edx // Prepare arguments for printf(x)
movl $.LCO, %esi
...
call __printf_chk
jmp .L3 // Unconditionally jump to the end (Done)
.L2: ; --- Body 2 (Else block) ---
movl %esi, %edx // Prepare arguments for printf(y)
...
call __printf_chk
.L3: ; --- Finish ---
addq $8, %rsp // Deallocate stack space
ret // ReturnThe logic is clear:
- Test: The
cmplinstruction sets the condition codes based on the result ofx - y. - Conditional Jump: The
jle(Jump if Less or Equal) instruction checks the flags. Ifx <= y, it jumps directly to the code for theelseblock at label.L2. - Then Block: If the jump is not taken, it means
x > y, and execution falls through to the code for thethenblock. - Unconditional Jump: After the
thenblock finishes, an unconditionaljmpskips over theelseblock to the end.
The Conditional Move (cmov) Optimization
For simple conditional assignments without side effects (like function calls), compilers can use a more efficient strategy that avoids jumps entirely: the conditional move instruction.
C Code:
int absdiff(int x, int y) {
int result;
if (x > y) {
result = x - y;
} else {
result = y - x;
}
return result;
}Assembly with cmov:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151117.png)
absdiff:
movl %edi, %eax // eax = x (this will be the final result)
subl %esi, %eax // eax = x - y (the "then" expression)
movl %esi, %edx // edx = y
subl %edi, %edx // edx = y - x (the "else" expression)
cmpl %esi, %edi // Compare y with x
cmovle %edx, %eax // Conditional Move: if (x <= y), then move edx into eax
ret // Return the value in %eaxHow it works:
- The code speculatively computes both the “then” result (
x-y) and the “else” result (y-x), storing them in separate registers. - It performs the comparison to set the condition codes.
- The
cmovle(Conditional Move if Less or Equal) instruction acts as a guarded update: it copies the “else” result into the destination register only if the condition is met. Otherwise, it does nothing. - The final result is now correctly in
%eaxregardless of which path was logically taken.
- Pro: Avoids conditional branches. On modern pipelined processors, a mispredicted branch can be very expensive, so a straight line of code is often faster.
- Con: It has the overhead of evaluating both branches. This optimization should not be used when the expressions are expensive or have side effects (e.g.,
printf).
Compiling Loops
All C loops (do-while, while, for) are ultimately compiled down to assembly using conditional jumps. The compiler often internally transforms while and for loops into a do-while structure, which is the most natural fit for assembly’s backward jumps.
do-while Loop
This is the simplest loop to compile because the condition is tested at the end.
C Code:
int fact_do(int x) {
int result = 1;
do {
result *= x;
x = x-1;
} while (x > 1);
return result;
}Assembly Translation:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151133.png)
fact_do:
movl $1, %eax // result = 1
.L2: // Start of loop body
imull %edi, %eax // result *= x
subl $1, %edi // x = x-1
cmpl $1, %edi // Compare x with 1
jg .L2 // Jump if Greater back to the start of the loop
retThis is a direct translation: the loop body is followed by a test and a backward conditional jump.
while Loop
A while loop tests the condition before the first iteration. A common and efficient compiler strategy is the “jump-to-middle” transformation.
C Code:
while (<test>) {
<body>
}goto Version (Compiler’s thought process):
goto Middle;
Loop:
<body>
Middle:
if (<test>)
goto Loop;Assembly Translation:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151155.png)
jmp .L34 // Unconditional jump to the test
.L35: // Top of the loop body
imull %edx, %eax // result *= x
decl %edx // x--
.L34: // The "middle": where the test happens
cmpl $1, %edx // Compare x with 1
jg .L35 // If greater, jump back to the loop bodyThis structure ensures the test is only written once in the assembly code while still correctly checking the condition before the first iteration.
for Loop
A for loop is simply syntactic sugar for a while loop.
for (<init>; <test>; <update>) { <body> }The compiler conceptually transforms this into:
<init>;
while (<test>) {
<body>
<update>;
}This while loop is then compiled using the jump-to-middle technique seen above. The <init> part is executed once before the loop begins, and the <update> part becomes the last statement inside the loop body before the next test.
switch Statements
switch statements provide multi-way branching. Compilers use two main strategies to implement them, depending on the case values.
Compact switch and Jump Tables
If the case labels are a dense, small range of integers (e.g., 0, 1, 2, 3, 5, 6), the compiler can use a highly efficient technique called a jump table.
A jump table is an array of code addresses. The value of the switch expression is used as an index into this array to find the address of the code for the corresponding case, and the program performs an indirect jump to that address.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151217.png)
Example Walkthrough:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151227.png)
-
Setup and Bounds Check: The compiler first checks if the switch variable (
x, in%rdi) is within the range of the jump table (e.g., 0 to 6). If it is out of bounds (ja .L8), it jumps to thedefaultcase. -
The Indirect Jump:
jmp *.L4(,%rdi,8)- This is the core of the jump table mechanism. It is an indirect jump that calculates a memory address and jumps to the location stored at that address.
.L4: The base address of the jump table array.%rdi: The index (x).8: The scale factor, since each address in the table is an 8-byte pointer on x86-64.- The instruction computes
Address = .L4 + x * 8, fetches the 8-byte code address stored there, and jumps to it.
-
The Jump Table in Memory: The jump table itself is stored in the read-only data section (
.rodata) of the program. It is just an array of addresses (labels)./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151237.png)
-
Code Blocks: Each label in the jump table corresponds to the start of a small chunk of assembly code that implements the body of that
case. Fall-through is implemented by simply having one code block end by jumping or falling into the next.
Sparse switch and Decision Trees
If the case labels are sparse and widely distributed (e.g., 0, 111, 222, … 999), a jump table would be enormous and mostly empty, a huge waste of space.
In this scenario, the compiler could generate a simple chain of if-else if statements. However, a more clever compiler will organize the comparisons into a balanced binary search tree to achieve logarithmic performance.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151255.png)
Assembly Translation:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151310.png)
The assembly code implements this tree:
- It first compares the input
xto a value in the middle of the range (e.g.,444). - Based on the result (
<,=,>), it jumps to a different part of the code. - Each of those parts then compares
xto the middle of its new, smaller sub-range.
This process continues until the correct case is found. For N cases, this takes roughly comparisons, which is much better than the of a simple linear search.
Procedure Call and Return
The final piece of control flow is calling and returning from functions. This is managed by the stack and a set of special instructions.
The x86-64 Stack
- It is a region of memory that grows toward lower addresses.
- The
%rspregister is the stack pointer; it always holds the address of the “top” of the stack (the lowest address currently in use).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251112151328.png)
pushq src: Pushes an 8-byte value onto the stack.- Decrements
%rspby 8. - Writes the value from
srcto the memory location now pointed to by%rsp.
- Decrements
popq dest: Pops an 8-byte value from the stack.- Reads the value from the memory location pointed to by
%rsp. - Stores that value in
dest. - Increments
%rspby 8.
- Reads the value from the memory location pointed to by
call and ret Instructions
These instructions automate the process of saving the return location.
-
call Label:- Pushes the return address onto the stack. The return address is the address of the instruction immediately following the
call. - Jumps (sets
%rip) toLabel.
- Pushes the return address onto the stack. The return address is the address of the instruction immediately following the
-
ret:- Pops the return address from the top of the stack.
- Jumps (sets
%rip) to that popped address.
This elegant mechanism allows for nested and recursive function calls. Each call saves its return point, and each ret brings execution back to where it left off, unwinding the call chain one step at a time.
Continue here: 12 Procedures and the Stack Frame