Lecture from: 02.12.2025 | Video: Videos ETHZ
A pivotal moment in the course has been reached. Throughout the journey, from assembly language to caches, operations have proceeded under a convenient delusion. It has been assumed that writing mov 0x4000, %eax accesses the physical storage cell at address 0x4000 in the DRAM chip.
This is a lie.
In modern systems, every address generated by the CPU, the program counter, the stack pointer, or the target of a pointer dereference is a Virtual Address. It is an artifact of the “Program State.” Before a single byte can be fetched from the main memory, the hardware and the Operating System conspire to translate this virtual address into a Physical Address.
This mechanism is called Virtual Memory (VM). It is considered one of the great ideas in computer science because it solves three massive problems simultaneously: Capacity, Management, and Protection.
The Three Problems
Limited Physical Capacity
A modern 64-bit processor has a virtual address space of 16 Exabytes (). A laptop typically has perhaps 16 Gigabytes of Physical RAM.
A mechanism is needed that gives every process the illusion of having a massive, contiguous memory space, while transparently using the physical RAM as a cache for the much larger (but slower) disk.
Memory Management
Consider a system running 100 processes (Spotify, Browser, VS Code, Kernel). If they all used physical addressing:
- They would constantly fight over who gets address
0x1000. - The linker would need to know the exact physical location of every program before it runs.
- The memory would become fragmented into unusable small holes.
VM gives every process its own private, linear address space. Process A sees a text segment at 0x400000, and Process B sees its text segment at 0x400000. They never collide because they are in different virtual universes.
Protection
A buggy C program must be prevented from overwriting the Kernel’s memory. A malicious script in a browser tab must be prevented from reading the memory of a banking tab. VM allows the OS to assign permission bits (Read, Write, Execute) to regions of memory.
VM as a Tool for Caching
The first way to think about Virtual Memory is as a caching system.
- Virtual Memory (Disk): This is the large, slow storage (the “Main Memory” in cache terminology).
- Physical Memory (DRAM): This is the small, fast cache.
Memory is divided into blocks called Pages.
- Virtual Pages (VPs): Stored on disk.
- Physical Pages (PPs): Cached in DRAM.
- Page Size: Typically 4 KB ( bytes).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20260106183430.png)
The Cost of a Miss
In the CPU cache hierarchy (L1 vs. DRAM), a miss costs cycles. In Virtual Memory (DRAM vs. Disk), a miss costs cycles.
Because the penalty for a “Page Fault” (a miss in physical memory) is so catastrophic, the design choices are different from CPU caches:
- Associativity: Fully Associative. A virtual page can be placed in any physical page frame. Conflict misses cannot be afforded.
- Replacement Policy: Handled by software (the OS). It is worth spending thousands of cycles running a smart algorithm (like Least Recently Used) to pick the perfect victim to evict, rather than suffering a disk access.
- Write Policy: Write-Back. Writing to disk on every store instruction is impossible. A dirty bit is used, and writing to disk only happens when a page is evicted.
VM as a Tool for Management
Virtual Memory simplifies the linkers and loaders.
Linking
The linker can produce an executable assuming that the .text section always starts at a standardized address (e.g., 0x400000 on Linux x86-64). It does not care where the physical RAM is.
Loading and Lazy Allocation
When a large program (like a game or a browser) is run, the OS does not copy the entire executable from disk to RAM immediately. That would take seconds. Instead, the loader:
- Allocates page table entries for the code and data.
- Marks them as Invalid (not in memory).
- Points them to the location of the
.exefile on the disk.
The program starts running immediately. As soon as it tries to execute the first instruction, it triggers a Page Fault. The OS pauses the process, fetches just that page of code, maps it, and resumes. This is Demand Paging.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20260106183537.png)
Sharing
Sometimes processes need to share memory. For example, the printf code lives in libc. If 100 processes are running, 100 copies of printf in RAM are undesirable.
The OS maps the read-only Virtual Pages of libc in every process to the same Physical Page frames.
Copy-on-Write (COW)
What if a writable page is shared? (e.g., after
fork(), the child process shares the parent’s memory). The OS marks the pages as Read-Only for both processes. If either process tries to write:
- Hardware triggers a Protection Fault.
- OS catches the fault.
- OS copies the physical page to a new location.
- OS updates the page table of the writing process to point to the new copy with Write permissions.
- OS restarts the instruction.
Address Translation Mechanics
How does the hardware perform this magic? It uses a Page Table. The Page Table is a data structure stored in physical memory (managed by the OS) that maps Virtual Page Numbers (VPN) to Physical Page Numbers (PPN).
The -bit Virtual Address is split into two components:
- Virtual Page Offset (VPO): The bottom bits (where Page Size). These bits do not change. The offset within a page is the same virtually and physically.
- Virtual Page Number (VPN): The top bits. This is the index into the page table.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20260106183606.png)
The Page Table Entry (PTE)
Each row in the Page Table (a Page Table Entry) contains:
- Valid Bit: Is this page in DRAM? (1 = Yes, 0 = No/On Disk).
- Physical Page Number (PPN): Where is it?
- Permission Bits: Read, Write, Execute, User/Supervisor.
- Dirty Bit: Has this page been modified?
- Accessed Bit: Has this page been read recently? (Used for LRU replacement).
Scenario A: Page Hit
- CPU sends Virtual Address to the MMU (Memory Management Unit).
- MMU uses the VPN to calculate the address of the PTE in main memory:
- MMU fetches the PTE.
- MMU checks Valid Bit = 1 and Permissions = OK.
- MMU constructs Physical Address = PPN concatenated with VPO.
- MMU fetches data from the Cache/Memory.
Scenario B: Page Fault
- MMU fetches PTE. Valid Bit = 0.
- MMU triggers a hardware Exception (Page Fault).
- Control transfers to the OS Kernel’s Page Fault Handler.
- OS Handler:
- Checks if the address is valid (if not Segfault).
- Finds a victim page in physical memory to evict.
- If victim is Dirty, writes it to disk.
- Loads the requested page from disk into the empty frame.
- Updates the Page Table (Valid = 1, new PPN).
- Return: The OS restarts the faulting instruction.
- The instruction executes again. This time, it is a Page Hit.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20260106183624.png)
The TLB (Translation Lookaside Buffer)
There is a performance disaster in the “Page Hit” scenario above. Was it caught? To fetch one piece of data (step 6), access to memory was required to read the Page Table (step 3).
Every memory access now takes twice as long.
To solve this, a specialized hardware cache inside the MMU called the Translation Lookaside Buffer (TLB) is introduced.
The TLB caches Page Table Entries. It maps VPNs directly to PPNs.
The Optimized Flow:
- MMU extracts VPN.
- MMU checks TLB.
- TLB Hit: Returns PPN immediately. (0 memory accesses).
- TLB Miss: MMU fetches PTE from memory, updates TLB, and retries. (1 memory access).
Since programs exhibit locality, the TLB hit rate is extremely high (), making address translation essentially free.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20260106183646.png)
A Concrete “Toy” Memory System
A translation will be walked through bit-by-bit. This is the best way to understand the interaction between TLBs, Page Tables, and Caches.
The System Specifications (Slide 48):
- Virtual Address: 14 bits.
- Physical Address: 12 bits.
- Page Size: 64 bytes ().
- TLB: 16 entries, 4-way associative.
- L1 Cache: 16 lines, 4-byte blocks, Direct Mapped.
1. Break down the addresses:
- Page Offset (VPO/PPO): Page is 64 bytes 6 bits.
- VPN: 14 bits total - 6 offset = 8 bits.
- PPN: 12 bits total - 6 offset = 6 bits.
2. Configure the TLB:
- 16 entries, 4-way sets.
- TLB Index (TLBI): bits. (Taken from the bottom of the VPN).
- TLB Tag (TLBT): Remaining VPN bits ( bits).
3. Configure the Cache:
- 16 lines, Direct Mapped 16 sets.
- Block Offset (CO): Block is 4 bytes 2 bits.
- Cache Index (CI): 16 sets 4 bits.
- Cache Tag (CT): Remaining Physical bits ( bits).
Example 1: Translating 0x03D4
Goal: Read data at Virtual Address 0x03D4.
Step 1: Parse Virtual Address
- Hex:
0x03D4Binary:00 0011 1101 0100(14 bits) - VPO (bottom 6):
010100(). - VPN (top 8):
00001111().
Step 2: TLB Lookup
The needed VPN is 0x0F (00001111).
- TLB Index (bottom 2 of VPN):
11(). - TLB Tag (top 6 of VPN):
000011(). - Action: Check TLB Set 3. Look for Tag
03. - Slide 53: Set 3 contains Tag
03, Valid=1, PPN=0D. - Result: TLB Hit! PPN is
0x0D.
Step 3: Construct Physical Address
- PA = PPN (
0x0D) : VPO (0x14) - PA =
001101:0101000011 0101 01000x354.
Step 4: Cache Lookup
Data at Physical Address 0x354 is needed.
- Block Offset (bottom 2):
00. - Cache Index (next 4):
0101(). - Cache Tag (top 6):
001101(). - Action: Check Cache Set 5. Look for Tag
0D. - Slide 56: Set 5 has Tag
0D, Valid=1. - Result: Cache Hit! The data byte at offset 0 is
0x36.
Example 2: Translating 0x0B8F
Goal: Read data at Virtual Address 0x0B8F.
Step 1: Parse Virtual Address
- Hex:
0x0B8FBinary:00 1011 1000 1111 - VPO:
001111(0x0F). - VPN:
00101110(0x2E).
Step 2: TLB Lookup
The needed VPN is 0x2E (00101110).
- TLBI (bottom 2):
10(). - TLBT (top 6):
001011(0x0B). - Action: Check TLB Set 2. Look for Tag
0B. - Slide 59: Set 2 has Tag
08,06,03. No0B. - Result: TLB Miss.
Step 3: Page Table Lookup
- The MMU must access memory to check the Page Table Entry for VPN
0x2E. - Note: The slide (Slide 60) does not explicitly show the Page Table Entry for
0x2E, but describes the process:- If the PTE in memory says Valid=1, the MMU loads the PPN into the TLB and retries.
- If the PTE says Valid=0, this is a Page Fault. The OS takes over to load the page from disk.
Example 3: Translating 0x0020
Goal: Read data at Virtual Address 0x0020.
Step 1: Parse Virtual Address
- Binary:
00 0000 0010 0000 - VPO:
100000(0x20). - VPN:
00000000(0x00).
Step 2: TLB Lookup
The needed VPN is 0x00.
- TLBI:
00(). - TLBT:
000000(0x00). - Action: Check TLB Set 0 for Tag
00. - Slide 62: Set 0 contains tags
03and07. No00. - Result: TLB Miss.
Step 3: Page Table Lookup
- The MMU checks the Page Table for VPN
00(index 0). - Slide 63: The PTE at index
00has PPN28and Valid1. - Result: Page Table Hit. (Not a Page Fault).
- The MMU updates the TLB with this new mapping.
Step 4: Construct Physical Address
- PA = PPN (
0x28) : VPO (0x20)0x0A20.
Step 5: Cache Lookup
- Binary
0xA200010 1000 1000 00. - Offset:
00. - Index:
1000(). - Tag:
001010().- The PPN is
0x28. The PA is28concatenated with20. - PA =
0010 1000 1000 00. - Bits 11-6 (Tag) =
001010(). - Bits 5-2 (Index) =
1000(). - Slide 66 Check: Set 8 has Tag
24. - Therefore: Cache Miss. (Memory must be accessed).
- The PPN is
The Page Table Size Problem
There is a practical problem with the structure described above. Assuming the Page Table is a simple linear array. The numbers for a real system like x86-64 will be crunched.
The Parameters (Slide 71):
- Page Size: 4 KB ().
- Virtual Address Space: 48 bits used.
- PTE Size: 8 Bytes ().
The Calculation:
- How many entries? One entry is needed for every Virtual Page.
- Total Size?
The Problem:
To run one process (like notepad), 512 GB of contiguous RAM would need to be allocated just for its page table.
This is clearly impossible. Even high-end servers rarely have 512 GB of RAM, let alone allocating it for every process.
A way to store these tables more efficiently is needed. Hint: Most of the address space is empty (unallocated). PTEs for the giant gaps between the stack and the heap should not need to be stored.
The solution to this problem is explored in the next section.
Continue here: 23 Advanced Virtual Memory and TLB