The course has thus far operated at the level of C: writing code, compiling it, and running it. The next step is to examine the transformation process that turns C code, a human-readable text file, into something the processor can execute. This requires an understanding of the machine itself.
This chapter introduces the language of the hardware: assembly language and the underlying Instruction Set Architecture (ISA). The x86 architecture serves as the primary example. While not the only or simplest architecture, its history makes it dominant in desktops and servers, and understanding it provides insight into modern computer operation.
What is an Instruction Set Architecture?
Distinguishing between architecture and microarchitecture is crucial.
Architecture (or ISA): This is the programmer’s view of the processor. It is the abstract interface defining what the hardware can do. It includes the available instructions (e.g., add, mov), programmer-visible registers, and the memory model.
Microarchitecture: This is the implementation of the architecture. It is the specific arrangement of transistors, caches, and pipelines a particular chip uses to execute the instructions defined by the ISA.
The ISA as a Contract
The ISA acts as a contract between hardware and software. The software (like a compiler) promises to only generate instructions defined in the ISA. The hardware promises that any of its implementations (microarchitectures) will correctly execute those instructions. This separation allows a program compiled today to run on a processor built years later, provided they adhere to the same ISA.
There are many ISAs, each with its own history and design philosophy, such as x86, ARM, RISC-V, and MIPS.
The Great Debate: CISC vs. RISC
Processor architectures have historically been divided into two main camps.
CISC: Complex Instruction Set Computer
Dominant through the mid-80s, with x86 as its most famous example, CISC aimed to make hardware powerful and the compiler’s job easy.
Complex, Powerful Instructions: CISC ISAs have instructions performing multi-step operations. A single x86 instruction can read a value from a complex memory address, perform an arithmetic operation, and write the result back.
addl %eax, 12(%rbx,%rcx,4)
This involves a complex address calculation (%rbx + %rcx*4 + 12), a memory read, an addition, and a memory write.
Variable-Length Instructions: Instructions are encoded using a variable number of bytes to save memory.
Memory-to-Memory Operations: Instructions can often operate directly on memory operands without loading into registers first.
Philosophy: Add instructions to perform “typical” programming tasks.
RISC: Reduced Instruction Set Computer
Pioneered at IBM and popularized by researchers at Stanford (MIPS) and Berkeley (RISC-V), RISC aimed to make hardware simple and fast, leaving complex tasks to the compiler.
Fewer, Simpler Instructions: A small set of basic operations.
Fixed-Size Instructions: Every instruction is the same length (e.g., 32 bits), simplifying decoding.
Load-Store Architecture: Only dedicated load and store instructions can access memory. Arithmetic instructions operate only on registers.
More Registers: Simpler logic allows for more general-purpose registers.
Philosophy: A simple, uniform instruction set enables faster and more efficient hardware implementation.
CISC vs. RISC Today
The debate has largely subsided due to Moore’s Law.
Desktops and Servers: The choice of ISA is no longer a primary technical issue. Modern CISC processors like x86 internally have a RISC-like core.
Embedded Processors: For low-power devices, RISC maintains an edge due to smaller, cheaper, and lower-power designs.
Non-Technical Factors: Factors like the software ecosystem, code compatibility, licensing models, and geopolitics are often more important than technical purity.
A Brief History of x86
The x86 architecture has a long history driven by Moore’s Law and backward compatibility.
1971: Intel 4004: The first commercial microprocessor, a 4-bit CPU.
1978: Intel 8086: The first 16-bit processor and the origin of the x86 architecture. It had a 1MB address space.
1985: Intel 80386 (i386): The first 32-bit processor (IA32). It introduced “flat addressing” and could run modern OSs like Unix.
~2003: AMD Opteron / Intel Pentium 4F: The introduction of the 64-bit extension (x86-64). AMD developed it first, and Intel adopted it. This dramatically expanded the number of registers.
Present Day: Modern processors have tens of billions of transistors and numerous instruction set extensions (MMX, SSE, AVX).
Remarkably, code written for the 8086 can often still run on a modern CPU.
Basics of x86 Machine Code
Bridging the gap between C and the machine involves understanding how a C function becomes executable bytes.
Compiling into Assembly
Consider a simple C function:
int sum(int x, int y) { int t = x + y; return t;}
The GCC compiler can stop after generating the assembly language file:
gcc -O0 -S code.c
This produces a human-readable text file, code.s, containing x86 assembly code.
The assembly code consists of mnemonics (like pushq, movl, addl) and operands (like %rbp, %edi).
The Assembly Programmer’s View
An assembly programmer works with a simple machine model:
Programmer-Visible State:
Program Counter (%rip on x86-64): Holds the address of the next instruction to execute.
Register File: A small, fast set of storage locations inside the CPU. x86-64 has 16 general-purpose 64-bit integer registers.
Condition Codes: Single-bit flags storing status information about the most recent arithmetic operation.
Memory: A large, byte-addressable array holding code, data, and the stack.
From Assembly to Object Code
The assembly text file (.s) is fed to an assembler, which translates mnemonics into binary machine code, producing an object file (.o).
Linking resolves references between files and combines them with libraries to create the final executable. A disassembler (like objdump -d or gdb’s disassemble) translates machine code back into assembly.
A Machine Instruction Example
Consider a single line of C, its assembly, and its object code.
C Code: int t = x+y;
Assembly: addl 8(%rbp), %eax
Object Code: 03 45 08
This 3-byte instruction performs the following:
addl means “add long” (32 bits).
The operands are 8(%rbp) (source) and %eax (destination).
It adds the 32-bit integer at address %rbp + 8 to the value in %eax, storing the result in %eax.
x86-64 Architecture in Detail
Registers
The x86-64 architecture provides 16 general-purpose 64-bit integer registers.
The 64-bit registers are named %rax, %rbx, …, %r15.
Register: Value in a register (e.g., movq %rax, %rbx).
Memory: Value from memory.
Memory-to-Memory Transfers
A single mov instruction cannot have both source and destination as memory locations. Data must be loaded into a register first.
Simple Memory Addressing Modes
Normal: (%rcx)
Uses value in %rcx as memory address.
C analog: *p.
Displacement: 8(%rbp)
Adds a constant offset to the register value.
C analog: p->field or local variable.
Understanding swap: A Complete Example
Tracing the unoptimized assembly for a simple C swap function:
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}
With optimizer off (-O0), the compiler uses the stack for temporary variables.
Function Prologue:
pushq %rbpmovq %rsp, %rbp
This saves the old base pointer and sets up a new stack frame.
Body:
Tracing int t0 = *xp;:
movq -24(%rbp), %rax // Move xp (from stack) into register %raxmovl (%rax), %eax // Dereference xp: load value from address in %rax into %eaxmovl %eax, -8(%rbp) // Store that value into t0's location on the stack
The compiler loads the pointer, dereferences it, and stores the result in a temporary memory location.
With the Optimizer On (-O2):
The compiler keeps everything in registers.
swap: movl (%rdi), %edx // t0 = *xp (xp is in %rdi) movl (%rsi), %eax // t1 = *yp (yp is in %rsi) movl %eax, (%rdi) // *xp = t1 movl %edx, (%rsi) // *yp = t0 retq
Temporary variables t0 and t1 reside in %edx and %eax without touching memory for storage.
Complete Memory Addressing Modes
The general form is:
D(Rb,Ri,S)
This computes an address as:
Address=Reg[Rb]+Reg[Ri]×S+D
D: Constant displacement.
Rb: Base register.
Ri: Index register.
S: Scale factor (1, 2, 4, or 8).
Why these scale factors?
Scale factors 1, 2, 4, and 8 correspond to sizes of common data types (char, short, int/float, long/double/pointer). This facilitates array element address calculation: &array[i] becomes address_of_array + i * sizeof(element).
The lea Instruction: Address Calculation as Arithmetic
The lea (Load Effective Address) instruction performs the address calculation of the general addressing mode but stores the calculated address itself into the destination register.
lea Src, Dest
Main uses:
Computing addresses: Corresponds to C’s & operator (e.g., p = &x[i];).
Fast arithmetic: Computes x + k*y in a single instruction, where k is 1, 2, 4, or 8.
Practice: Data Movement and Addressing
Understanding which mov operations are legal is essential for reading assembly.
Exercise: Valid/Invalid mov Instructions
Consider a 64-bit system. Identify which of these are ILLEGAL and why.
movq $0x1, $0x2
movl %eax, (%rsp)
movb (%rdi), (%rsi)
movw %ax, %bx
Solutions:
Illegal: You cannot move an immediate value into another immediate value. The destination must be a register or memory.
Legal: Moving a 32-bit register value into a memory location.
Illegal: Memory-to-memory transfers are not allowed in a single x86 instruction. You must go through a register.
Legal: Moving a 16-bit register value to another 16-bit register.
Exercise: lea Arithmetic
What is the result in %rax after this instruction, if %rdx = 10?
leaq 5(%rdx, %rdx, 4), %rax