Lecture from: 16.09.2025 | Video: Videos ETHZ

This lecture introduces Systems Programming and Computer Architecture, a course designed to peel back the layers of abstraction inherent in modern computing to reveal what actually happens inside the machine. The goal is to fundamentally change how one thinks about code and computers.

Content Overview

The course focuses on “getting hands dirty with the metal,” covering several key areas:

  • Speed and Correctness: Systems code must be efficient and robust.
  • Systems vs. Application Code: Understanding the full stack leads to better code in any domain.
  • Performance Mechanics: Efficiency depends on hardware realities, not just Big O.
  • C and Assembly: These remaining the foundation for controlling and understanding the machine.
  • Programs as Actors: Code interacts with a complex world of hardware and networks.
  • Hardware-Software Interface: Bridge the gap between logic and silicon.

Instructors and Context

The course is taught by Prof. Timothy Roscoe and Prof. Ana Klimovic. Both instructors bring industry experience to the academic setting, dealing with the realities of production systems. Prof. Roscoe previously worked at Intel, and Prof. Klimovic was a Research Scientist at Google Brain.

The material draws from heavily established sources:

  • Approximately half the material is based on the CS 15-213 course at Carnegie Mellon University (CMU).
  • Some C programming material is adapted from CSE333 at the University of Washington.
  • New material covering multicore systems and devices has been added specifically for this course.

What is Systems Programming?

There is a fundamental distinction between systems programmers and application programmers. Systems programmers can and do write applications, often being the most productive programmers because they understand the entire stack. Their code tends to be more efficient and robust. Conversely, pure application programmers are generally not suited to write system software because the mindset and assumptions required are fundamentally different.

System software is the foundational layer that makes all other software possible. It includes:

  • Operating systems
  • Database systems
  • Networking protocols and routing
  • Compiler design and implementation
  • Distributed systems
  • Cloud computing and online services
  • Big Data and machine learning frameworks

It encompasses everything on and above the hardware/software boundary.

Why Systems Matters

Systems software provides the foundation for all computing. It provides the abstractions application programmers rely on, such as processes, threads, virtual memory, garbage collection, files, and network sockets. It also ensures security properties like authentication and access control. Without systems software, a computer would be little more than a calculator; systems software enables I/O devices, graphical output, and multiprocessing.

Quote

“In designing an operating system one needs both theoretical insight and horse sense. Without the former, one designs an ad hoc mess; without the latter one designs an elephant in best Carrara marble (white, perfect, and immobile).” — Roger Needham and David Hartley, 1968

The field is a mix of deep theory and brutal pragmatism. It requires a different way of thinking. Most Computer Science courses emphasize abstraction, treating programs as pure mathematical objects with well-defined behaviors. Systems programmers, however, must understand the details of the underlying implementation. They must know when abstractions hold and, crucially, when they leak.

Inconvenient Truths About Computers

Several “truths” about computers challenge the clean, abstract models often taught in introductory courses.

Computers Don’t Really Deal with Numbers

Integer Arithmetic

In mathematics, the set of integers is infinite. A sum is a well-defined concept. On a computer, however, integer arithmetic behaves differently.

Consider a simple C program that sums integers from standard input:

#include <stdio.h>
#include <stdlib.h>
 
#define BUFFER_LENGTH 80
 
int main(int argc, char *argv[])
{
    char buffer[BUFFER_LENGTH];
    int total = 0;
 
    while( fgets(buffer, BUFFER_LENGTH, stdin) ) {
        total += atoi(buffer);
    }
    printf("Total is %d\n", total);
    return 0;
}

If the input is 0, 1, 2, 3, 4, the output is 10. However, if the input includes large numbers that exceed the capacity of the integer type, the result can be unexpected. For example, a specific sequence of large inputs might result in Total is 42 due to overflow.

Important

This is not a bug. It is correct behavior based on the definition of fixed-width integer arithmetic.

The int type in C is a fixed-size container (usually 32 bits) that operates using modular arithmetic.

Floating-Point Arithmetic

Similarly, floating-point arithmetic does not strictly follow mathematical rules. In mathematics, addition is associative: . In floating-point arithmetic, this property can break down due to precision limits.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    float x = 1e20;
    float y = -1e20;
    float z = 3.14;
 
    printf("(x + y) + z = %f\n", (x + y) + z);
    printf("x + (y + z) = %f\n", x + (y + z));
    return 0;
}

Running this code yields two different answers.

Summary of Computer Arithmetic

  • It does not generate random values; results are deterministic.
  • “Usual” mathematical properties cannot be assumed due to finiteness.
  • Integer operations satisfy “ring” properties (commutativity, associativity, distributivity) over a finite field (e.g., modulo ).
  • Floating point operations satisfy “ordering” properties.

The Best Programmers Know Assembly

Even though most programmers will never write pure assembly code, understanding it is key to the machine-level execution model.

  • Debugging: Assembly reveals the reality when high-level abstractions fail.
  • Performance: Knowing what the compiler does is essential for tuning.
  • Systems Infrastructure: OS kernels must manage machine state directly.
  • Security: Exploits and malware operate at the machine level.

Example: Measuring Cycles

The Time Stamp Counter (TSC) is a 64-bit register on Intel-compatible machines that increments with every clock cycle. It serves as a precise timer. Reading it requires a small piece of assembly code embedded in C.

uint64_t rdtsc()
{
    uint32_t lo, hi;
    asm volatile("rdtsc; movl eax,%1"
        : "=r" (hi), "=r" (lo)
        :
        : "%edx", "%eax");
    return (lo | (((uint64_t)hi) << 32));
}

This function allows for measuring exactly how many cycles a computation takes, which is the fundamental tool for performance measurement.

Performance is More Than Asymptotic Complexity

While asymptotic complexity (Big O notation) is a powerful theoretical tool, real-world performance involves other factors:

  • Constant factors matter. An algorithm with a large constant factor can be slower than an algorithm for realistic input sizes.
  • Implementation details matter. The same algorithm can have a 10x performance difference depending on how it is written.
  • Optimization levels. Performance must be optimized at the algorithm, data representation, procedure, and loop levels.

Example: Matrix Multiplication

Matrix multiplication is a fundamental operation. The naive implementation is .

A simple triple-loop implementation sees performance decrease as matrix size grows. However, a hand-optimized implementation (by Kazushige Goto) for the same processor is 160 times faster despite performing the exact same number of floating-point operations.

The speedup comes from systems-level optimizations:

  • Parallelism: Using multiple cores (4x).
  • Vectorization: Using SIMD instructions (4x).
  • Memory Hierarchy: Optimizations like blocking and tiling to reduce cache misses (20x).

Memory is Not a Idealized Array

Memory is not a uniform, unbounded array.

  • It must be allocated and managed.
  • Performance is non-uniform; cache and virtual memory effects are significant.
  • Memory is typed.
  • Sometimes “memory” access is actually I/O or a device interaction.

Memory Layout

Bugs related to memory layout can be confusing. For example, writing out of bounds in an array inside a struct can overwrite adjacent variables.

typedef struct {
    int a[2];
    double d;
} struct_t;

Writing to a[2] (which is out of bounds) might corrupt d, leading to bizarre behavior or crashes.

Memory Performance

The pattern of memory access is critical. Accessing a 2D array sequentially (src[i][j]) is much faster than accessing it with a stride (src[j][i]). On an Intel Core i7, the difference can be 5.2ms versus 162ms. The sequential access is cache-friendly, while the strided access causes constant cache misses.

The Complexity of Memory Instructions

A simple assignment *p = v compiles to a single machine instruction (e.g., movq), but this instruction can trigger a cascade of system events:

  • Cache hits or misses.
  • TLB hits or misses.
  • Page faults, potentially requiring disk I/O.
  • Protection faults.
  • I/O operations if the address maps to a device.

Computers Do Not Just Execute Programs

Computers must interpret data from the physical world (I/O) and communicate over networks. The simplified model of a CPU connected to memory is insufficient for modern systems.

Modern chips are complex Systems-on-Chip (SoCs) with specialized components, and research systems like Enzian integrate CPUs closely with FPGAs. Programming these requires systems software expertise.

Programs are Not Semantic Specifications

Language standards often describe behavior as “implementation dependent.”

  • Modern Compiler View: The compiler can do anything permitted by the standard to optimize code.
  • Systems View: The compiler maps code to the target hardware in a predictable way.

Success

A C program is not a semantic specification. A program is a set of instructions to a compiler that tell it what assembly language to generate.

Course Logistics

Goals

  • Build a mental model of the machine from silicon up.
  • Learn to find and fix bugs efficiently.
  • Understand and tune performance.
  • Prepare for advanced systems courses.

Evaluation

The exam will be online using CodeExpert. It tests C knowledge and systems programming skills. Questions often involve writing code, such as implementing a function to calculate values in a specific representation, rather than just answering multiple-choice questions.

  • CS:APP3e: Computer Systems: A Programmer’s Perspective (Bryant & O’Hallaron) — The main textbook.
  • K&R: The C Programming Language (Kernighan & Ritchie) — The classic C reference.

Practice: Systems Thinking

Systems programming requires shifting from mathematical ideals to physical realities.

Exercise: Integer Overflow

A 32-bit signed integer has a maximum value of (TMax). What happens when you add 1 to it in C?

  • Answer: It wraps around to (TMin). This is a consequence of Two’s Complement arithmetic on fixed-width hardware.

Exercise: Floating Point Precision

Why is (1e20 + -1e20) + 3.14 different from 1e20 + (-1e20 + 3.14)?

  • Answer: Precision Loss. In the second case, 3.14 is so much smaller than 1e20 that it is essentially rounded away when added, leaving 1e20 + -1e20 = 0. In the first case, the large opposites cancel out first, leaving exactly 3.14.

Exercise: The “Leaky” Abstraction

Give an example of when the abstraction of memory as a simple array “leaks” (i.e., when you need to know the implementation).

  • Answer: Cache Performance. If you access an array with a large stride, it is mathematically identical to sequential access but physically much slower due to cache misses.

Continue here: 02 C Basics and the Mental Model