Lecture from: 26.11.2025 | Video: Videos ETHZ
The discussion on memory hierarchies concludes by looking at a specific edge case in blocking and a real-world system optimization. Then, the focus pivots to a fundamental mechanism of computer systems: Exceptions.
Cache Blocking: The Edge Case
In the previous lecture, blocked matrix multiplication was analyzed assuming the block size was a multiple of the cache block size. But what happens if the block size is smaller than the cache block size (e.g., doubles, where cache line = 8 doubles)?
If , the math becomes slightly more complex because data fetches cannot be perfectly aligned.
Miss Analysis (for cache block)
When iterating through the blocks:
- Matrix B: Elements are accessed down a column. Since the stride is large, every access to a new row in the block likely hits a new cache line.
- Matrix A: Elements are accessed across a row. Some spatial locality is gained.
The formula for total misses becomes:
This simplifies to:
The Limit
If , this formula reduces to roughly , which is the same as the naive (unblocked) implementation. If , it yields , or , matching the previous optimal analysis.
The takeaway: should be maximized such that (cache size). If is too small, the benefits of blocking are lost. If is too big, the cache is thrashed.
Real World Example: IX Operating System
Optimizing for locality is not just for matrix math. It defines modern OS design. Consider IX, a datacenter operating system designed for high-throughput networking. It uses two different strategies based on cache locality:
- Run to Completion: When a packet arrives, it is processed all the way through to the application response.
- Optimizes: Data Cache Locality (the packet data stays hot in L1).
- Adaptive Batching: If load is high, packets are grouped and the same operation is applied to all of them.
- Optimizes: Instruction Cache Locality (the code for the operation stays hot in L1).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251214111848.png)
Exceptions: Altering Control Flow
Until now, the CPU has followed a Physical Control Flow: a sequential stream of instructions (inst1, inst2, inst3…). This flow could be altered using Program State:
- Jumps and Branches (Loops, Ifs).
- Call and Return (Functions).
But this is insufficient. A real system must react to System State:
- Hardware errors: Division by zero, memory corruption.
- I/O events: A key was pressed, a packet arrived from Spotify.
- Kernel requests: The user program wants to read a file.
A mechanism for Exceptional Control Flow is needed.
Exception
An Exception is a transfer of control to the Operating System in response to some event.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251214112742.png)
The Flow:
- Event occurs (e.g., Ctrl-C pressed).
- CPU interrupts the User Process.
- CPU switches to Kernel Mode (Ring 0).
- CPU jumps to the Exception Handler in the OS.
- OS handles the event.
- OS performs a special Return from Interrupt (e.g.,
iret) to resume the User Process (or abort it).
The Exception Table
How does the CPU know where the OS handler is? It cannot guess.
At boot time, the OS allocates a jump table called the Exception Table (or Interrupt Vector Table).
- Exception Table Base Register: A hardware register holding the physical address of the start of this table.
- Exception Number (): A unique ID for every type of event (0 to 255 on x86).
When exception occurs:
Classes of Exceptions
Exceptions are categorized based on two axes: Cause (Internal vs. External) and **Recoverability.
| Type | Source | Synchronous? | Behavior | Return Location |
|---|---|---|---|---|
| Interrupt | I/O Device | Async | Signal from external world | Next Instruction |
| Trap | Program | Sync | Intentional (Syscall) | Next Instruction |
| Fault | Program | Sync | Error (potentially fixable) | Current Instruction |
| Abort | Hardware | Sync | Fatal Error | Never (Program Dies) |
Synchronous Exceptions (Internal)
These happen because the CPU executed a specific instruction.
-
Traps (Intentional):
- The most common example is the System Call.
- A user calls
open(). The assembly instructionint 0x80(orsyscall) is executed. - This triggers a Trap. The OS runs the “file open” code and returns to the instruction after the syscall.
-
Faults (Recoverable Errors):
- Example: Page Fault.
- A user writes to
Mem[X].Mem[X]is swapped out to disk. - CPU triggers a Fault. The OS pauses the process, fetches the page from disk to RAM, and updates tables.
- Crucial: The OS returns to the Current Instruction (the one that failed). The CPU retries the memory access, and this time it succeeds.
-
Aborts (Fatal Errors):
- Example: Machine Check (hardware memory corruption detected).
- The OS kills the process or halts the machine.
Fault vs. Signal
- Page Fault: The OS fixes it silently. The user process never knows.
- Segfault (Invalid Memory): The OS detects the address is invalid. It sends a
SIGSEGVsignal to the process. The default handler forSIGSEGVis to kill the process.
Asynchronous Exceptions (External Interrupts)
These are caused by events outside the CPU. They can strike at any time, unrelated to the current instruction.
- Examples: Network packet arrival, a user pressing a key, mouse movement.
- Mechanism: Physical pins on the CPU.
Hardware Interrupts: The Pins
On a standard x86 processor, how do these external signals physically enter the chip?
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251214112807.png)
1. NMI (Non-Maskable Interrupt)
- Pin:
NMI. - Vector: Fixed at Exception #2.
- Nature: Critical. The CPU cannot ignore this.
- Usage: Catastrophic hardware failures (Memory parity error, Fan failure).
- Issues: The CPU knows something is wrong, but NMI does not carry extra data. The OS must query all hardware components to find the source.
2. INTR (Interrupt Request)
- Pin:
INTR. - Nature: Maskable. The OS can tell the CPU to ignore these (using
CLIinstruction) if it is doing something critical. - The Handshake (Interrupt Acknowledge Cycle):
- Device asserts
INTR. - If interrupts are enabled (
IF=1), CPU pauses after the current instruction. - CPU asserts
INTA(Interrupt Acknowledge) pin. - The Device (or Controller) puts its Interrupt Vector ID on the data bus.
- CPU reads the ID and jumps to that vector in the Exception Table.
- Device asserts
The Programmable Interrupt Controller (PIC)
Modern PCs have hundreds of devices. It is not possible to wire them all to the single INTR pin directly. A manager is needed.
The PIC (Programmable Interrupt Controller)
A chip (or logic block) that sits between devices and the CPU.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251214112821.png)
Functions of the PIC
- Mapping: Maps physical device lines (IRQ 0, IRQ 1…) to x86 Interrupt Vectors (e.g., Vector 50, Vector 51…). The OS programs this mapping.
- Buffering: If multiple interrupts happen at once, it holds them.
- Prioritizing: It decides who goes first (e.g., Network > Keyboard).
- Masking: The OS can tell the PIC to explicitly ignore the Network card, even if the CPU is accepting other interrupts.
Modern Evolution
- 8259 PIC: The classic chip (legacy).
- APIC (Advanced PIC): Required for Multi-core.
- Local APIC (LAPIC): Inside the CPU core. Handles timer and inter-core interrupts.
- IO APIC: On the motherboard. Routes device interrupts to specific cores.
- MSI (Message Signaled Interrupts): The modern PCIe standard. Devices do not pull a wire voltage high; they write a specific data packet to a special memory address to trigger an interrupt. This solves the scaling problem of physical wires.
Summary
- Interrupts: External, asynchronous events managed by hardware controllers (PIC/APIC) that feed vector numbers to the CPU.
Practice: Exception Classification
Understanding the four classes of exceptions is VITAL for identifying where a program failed and whether it can recover.
Exercise: Identify the Exception Class
For each scenario, identify whether it is an Interrupt, Trap, Fault, or Abort, and whether the CPU returns to the current or next instruction.
- Instruction:
div %ebxwhere%ebxis 0.- Answer: Fault. The CPU triggers a “Divide Error”. Generally, the OS kills the program (Signal), but technically it is a synchronous error.
- Instruction:
syscall.- Answer: Trap. Intentional request for OS service. Returns to the Next instruction.
- Event: A user hits
Ctrl-C.- Answer: Interrupt. Asynchronous external event. Returns to the Next instruction (or whenever the process is scheduled next).
- Instruction:
mov 0x123, %raxwhere the page is on disk.- Answer: Fault. Recoverable. Returns to the Current instruction to retry the move.
Continue here: 22 Introduction to Virtual Memory