Lecture from: 02.12.2025 | Video: Videos ETHZ

A pivotal moment in the course has been reached. Throughout the journey, from assembly language to caches, operations have proceeded under a convenient delusion. It has been assumed that writing mov 0x4000, %eax accesses the physical storage cell at address 0x4000 in the DRAM chip.

This is a lie.

In modern systems, every address generated by the CPU, the program counter, the stack pointer, or the target of a pointer dereference is a Virtual Address. It is an artifact of the “Program State.” Before a single byte can be fetched from the main memory, the hardware and the Operating System conspire to translate this virtual address into a Physical Address.

This mechanism is called Virtual Memory (VM). It is considered one of the great ideas in computer science because it solves three massive problems simultaneously: Capacity, Management, and Protection.

The Three Problems

Limited Physical Capacity

A modern 64-bit processor has a virtual address space of 16 Exabytes (). A laptop typically has perhaps 16 Gigabytes of Physical RAM.

A mechanism is needed that gives every process the illusion of having a massive, contiguous memory space, while transparently using the physical RAM as a cache for the much larger (but slower) disk.

Memory Management

Consider a system running 100 processes (Spotify, Browser, VS Code, Kernel). If they all used physical addressing:

  • They would constantly fight over who gets address 0x1000.
  • The linker would need to know the exact physical location of every program before it runs.
  • The memory would become fragmented into unusable small holes.

VM gives every process its own private, linear address space. Process A sees a text segment at 0x400000, and Process B sees its text segment at 0x400000. They never collide because they are in different virtual universes.

Protection

A buggy C program must be prevented from overwriting the Kernel’s memory. A malicious script in a browser tab must be prevented from reading the memory of a banking tab. VM allows the OS to assign permission bits (Read, Write, Execute) to regions of memory.


VM as a Tool for Caching

The first way to think about Virtual Memory is as a caching system.

  • Virtual Memory (Disk): This is the large, slow storage (the “Main Memory” in cache terminology).
  • Physical Memory (DRAM): This is the small, fast cache.

Memory is divided into blocks called Pages.

  • Virtual Pages (VPs): Stored on disk.
  • Physical Pages (PPs): Cached in DRAM.
  • Page Size: Typically 4 KB ( bytes).

The Cost of a Miss

In the CPU cache hierarchy (L1 vs. DRAM), a miss costs cycles. In Virtual Memory (DRAM vs. Disk), a miss costs cycles.

Because the penalty for a “Page Fault” (a miss in physical memory) is so catastrophic, the design choices are different from CPU caches:

  1. Associativity: Fully Associative. A virtual page can be placed in any physical page frame. Conflict misses cannot be afforded.
  2. Replacement Policy: Handled by software (the OS). It is worth spending thousands of cycles running a smart algorithm (like Least Recently Used) to pick the perfect victim to evict, rather than suffering a disk access.
  3. Write Policy: Write-Back. Writing to disk on every store instruction is impossible. A dirty bit is used, and writing to disk only happens when a page is evicted.

VM as a Tool for Management

Virtual Memory simplifies the linkers and loaders.

Linking

The linker can produce an executable assuming that the .text section always starts at a standardized address (e.g., 0x400000 on Linux x86-64). It does not care where the physical RAM is.

Loading and Lazy Allocation

When a large program (like a game or a browser) is run, the OS does not copy the entire executable from disk to RAM immediately. That would take seconds. Instead, the loader:

  1. Allocates page table entries for the code and data.
  2. Marks them as Invalid (not in memory).
  3. Points them to the location of the .exe file on the disk.

The program starts running immediately. As soon as it tries to execute the first instruction, it triggers a Page Fault. The OS pauses the process, fetches just that page of code, maps it, and resumes. This is Demand Paging.

Sharing

Sometimes processes need to share memory. For example, the printf code lives in libc. If 100 processes are running, 100 copies of printf in RAM are undesirable. The OS maps the read-only Virtual Pages of libc in every process to the same Physical Page frames.

Copy-on-Write (COW)

What if a writable page is shared? (e.g., after fork(), the child process shares the parent’s memory). The OS marks the pages as Read-Only for both processes. If either process tries to write:

  1. Hardware triggers a Protection Fault.
  2. OS catches the fault.
  3. OS copies the physical page to a new location.
  4. OS updates the page table of the writing process to point to the new copy with Write permissions.
  5. OS restarts the instruction.

Address Translation Mechanics

How does the hardware perform this magic? It uses a Page Table. The Page Table is a data structure stored in physical memory (managed by the OS) that maps Virtual Page Numbers (VPN) to Physical Page Numbers (PPN).

The -bit Virtual Address is split into two components:

  1. Virtual Page Offset (VPO): The bottom bits (where Page Size). These bits do not change. The offset within a page is the same virtually and physically.
  2. Virtual Page Number (VPN): The top bits. This is the index into the page table.

The Page Table Entry (PTE)

Each row in the Page Table (a Page Table Entry) contains:

  • Valid Bit: Is this page in DRAM? (1 = Yes, 0 = No/On Disk).
  • Physical Page Number (PPN): Where is it?
  • Permission Bits: Read, Write, Execute, User/Supervisor.
  • Dirty Bit: Has this page been modified?
  • Accessed Bit: Has this page been read recently? (Used for LRU replacement).

Scenario A: Page Hit

  1. CPU sends Virtual Address to the MMU (Memory Management Unit).
  2. MMU uses the VPN to calculate the address of the PTE in main memory:
  3. MMU fetches the PTE.
  4. MMU checks Valid Bit = 1 and Permissions = OK.
  5. MMU constructs Physical Address = PPN concatenated with VPO.
  6. MMU fetches data from the Cache/Memory.

Scenario B: Page Fault

  1. MMU fetches PTE. Valid Bit = 0.
  2. MMU triggers a hardware Exception (Page Fault).
  3. Control transfers to the OS Kernel’s Page Fault Handler.
  4. OS Handler:
    • Checks if the address is valid (if not Segfault).
    • Finds a victim page in physical memory to evict.
    • If victim is Dirty, writes it to disk.
    • Loads the requested page from disk into the empty frame.
    • Updates the Page Table (Valid = 1, new PPN).
  5. Return: The OS restarts the faulting instruction.
  6. The instruction executes again. This time, it is a Page Hit.


The TLB (Translation Lookaside Buffer)

There is a performance disaster in the “Page Hit” scenario above. Was it caught? To fetch one piece of data (step 6), access to memory was required to read the Page Table (step 3).

Every memory access now takes twice as long.

To solve this, a specialized hardware cache inside the MMU called the Translation Lookaside Buffer (TLB) is introduced.

The TLB caches Page Table Entries. It maps VPNs directly to PPNs.

The Optimized Flow:

  1. MMU extracts VPN.
  2. MMU checks TLB.
    • TLB Hit: Returns PPN immediately. (0 memory accesses).
    • TLB Miss: MMU fetches PTE from memory, updates TLB, and retries. (1 memory access).

Since programs exhibit locality, the TLB hit rate is extremely high (), making address translation essentially free.


A Concrete “Toy” Memory System

A translation will be walked through bit-by-bit. This is the best way to understand the interaction between TLBs, Page Tables, and Caches.

The System Specifications (Slide 48):

  • Virtual Address: 14 bits.
  • Physical Address: 12 bits.
  • Page Size: 64 bytes ().
  • TLB: 16 entries, 4-way associative.
  • L1 Cache: 16 lines, 4-byte blocks, Direct Mapped.

1. Break down the addresses:

  • Page Offset (VPO/PPO): Page is 64 bytes 6 bits.
  • VPN: 14 bits total - 6 offset = 8 bits.
  • PPN: 12 bits total - 6 offset = 6 bits.

2. Configure the TLB:

  • 16 entries, 4-way sets.
  • TLB Index (TLBI): bits. (Taken from the bottom of the VPN).
  • TLB Tag (TLBT): Remaining VPN bits ( bits).

3. Configure the Cache:

  • 16 lines, Direct Mapped 16 sets.
  • Block Offset (CO): Block is 4 bytes 2 bits.
  • Cache Index (CI): 16 sets 4 bits.
  • Cache Tag (CT): Remaining Physical bits ( bits).

Example 1: Translating 0x03D4

Goal: Read data at Virtual Address 0x03D4.

Step 1: Parse Virtual Address

  • Hex: 0x03D4 Binary: 00 0011 1101 0100 (14 bits)
  • VPO (bottom 6): 010100 ().
  • VPN (top 8): 00001111 ().

Step 2: TLB Lookup The needed VPN is 0x0F (00001111).

  • TLB Index (bottom 2 of VPN): 11 ().
  • TLB Tag (top 6 of VPN): 000011 ().
  • Action: Check TLB Set 3. Look for Tag 03.
  • Slide 53: Set 3 contains Tag 03, Valid=1, PPN=0D.
  • Result: TLB Hit! PPN is 0x0D.

Step 3: Construct Physical Address

  • PA = PPN (0x0D) : VPO (0x14)
  • PA = 001101 : 010100 0011 0101 0100 0x354.

Step 4: Cache Lookup Data at Physical Address 0x354 is needed.

  • Block Offset (bottom 2): 00.
  • Cache Index (next 4): 0101 ().
  • Cache Tag (top 6): 001101 ().
  • Action: Check Cache Set 5. Look for Tag 0D.
  • Slide 56: Set 5 has Tag 0D, Valid=1.
  • Result: Cache Hit! The data byte at offset 0 is 0x36.

Example 2: Translating 0x0B8F

Goal: Read data at Virtual Address 0x0B8F.

Step 1: Parse Virtual Address

  • Hex: 0x0B8F Binary: 00 1011 1000 1111
  • VPO: 001111 (0x0F).
  • VPN: 00101110 (0x2E).

Step 2: TLB Lookup The needed VPN is 0x2E (00101110).

  • TLBI (bottom 2): 10 ().
  • TLBT (top 6): 001011 (0x0B).
  • Action: Check TLB Set 2. Look for Tag 0B.
  • Slide 59: Set 2 has Tag 08, 06, 03. No 0B.
  • Result: TLB Miss.

Step 3: Page Table Lookup

  • The MMU must access memory to check the Page Table Entry for VPN 0x2E.
  • Note: The slide (Slide 60) does not explicitly show the Page Table Entry for 0x2E, but describes the process:
    • If the PTE in memory says Valid=1, the MMU loads the PPN into the TLB and retries.
    • If the PTE says Valid=0, this is a Page Fault. The OS takes over to load the page from disk.

Example 3: Translating 0x0020

Goal: Read data at Virtual Address 0x0020.

Step 1: Parse Virtual Address

  • Binary: 00 0000 0010 0000
  • VPO: 100000 (0x20).
  • VPN: 00000000 (0x00).

Step 2: TLB Lookup The needed VPN is 0x00.

  • TLBI: 00 ().
  • TLBT: 000000 (0x00).
  • Action: Check TLB Set 0 for Tag 00.
  • Slide 62: Set 0 contains tags 03 and 07. No 00.
  • Result: TLB Miss.

Step 3: Page Table Lookup

  • The MMU checks the Page Table for VPN 00 (index 0).
  • Slide 63: The PTE at index 00 has PPN 28 and Valid 1.
  • Result: Page Table Hit. (Not a Page Fault).
  • The MMU updates the TLB with this new mapping.

Step 4: Construct Physical Address

  • PA = PPN (0x28) : VPO (0x20) 0x0A20.

Step 5: Cache Lookup

  • Binary 0xA20 0010 1000 1000 00.
  • Offset: 00.
  • Index: 1000 ().
  • Tag: 001010 ().
    • The PPN is 0x28. The PA is 28 concatenated with 20.
    • PA = 0010 1000 1000 00.
    • Bits 11-6 (Tag) = 001010 ().
    • Bits 5-2 (Index) = 1000 ().
    • Slide 66 Check: Set 8 has Tag 24.
    • Therefore: Cache Miss. (Memory must be accessed).

The Page Table Size Problem

There is a practical problem with the structure described above. Assuming the Page Table is a simple linear array. The numbers for a real system like x86-64 will be crunched.

The Parameters (Slide 71):

  • Page Size: 4 KB ().
  • Virtual Address Space: 48 bits used.
  • PTE Size: 8 Bytes ().

The Calculation:

  1. How many entries? One entry is needed for every Virtual Page.
  2. Total Size?

The Problem: To run one process (like notepad), 512 GB of contiguous RAM would need to be allocated just for its page table. This is clearly impossible. Even high-end servers rarely have 512 GB of RAM, let alone allocating it for every process.

A way to store these tables more efficiently is needed. Hint: Most of the address space is empty (unallocated). PTEs for the giant gaps between the stack and the heap should not need to be stored.

The solution to this problem is explored in the next section.


Continue here: 23 Advanced Virtual Memory and TLB