Lecture from: 17.09.2025 | Video: Videos ETHZ

Following the introduction to the mindset of a systems programmer, this lecture introduces the primary tool for the course: the C programming language. Despite its age, C remains the lingua franca of systems programming. It offers unparalleled control over the machine, albeit without the safety nets provided by modern managed languages.

History and Toolchain

History

Understanding C requires examining its origins.

C was developed between 1969 and 1972 by Dennis Ritchie (along with Brian Kernighan) at Bell Labs. It did not appear in a vacuum but evolved through a clear lineage:

  • CPL (Combined Programming Language, 1963): A massive, complex language considered “unimplementable.”
  • BCPL (Basic CPL, 1967): A radical simplification of CPL, stripping it down to a single data type (the machine word). It functioned essentially as portable assembly.
  • B (1969): Ken Thompson’s adaptation of BCPL for the nascent Unix operating system.
  • C: Ritchie’s successor to B, which added back a simple type system.

C was highly influenced by the DEC PDP-11 architecture, the machine to which Unix was being ported. Many of C’s quirks are reflections of the PDP-11 instruction set. Despite these specific origins, C was designed for portability, a key factor in the success of both the language and the Unix operating system.

Standards

  • K&R C: The original language described in Kernighan and Ritchie’s book, The C Programming Language. At this stage, the compiler source code effectively served as the specification.
  • ANSI C (C89/C90): The first formal standard.
  • C99: A major update adding many features used in this course.
  • C11, C17: More recent updates with minor features and bug fixes.

Enduring Popularity

Decades after its creation, C remains ubiquitous. It consistently ranks at or near the top of indices like TIOBE.

Its persistence in the face of modern alternatives (Java, C#, Python, Rust) is due to specific trade-offs:

  • Speed: A good C compiler generates highly optimized machine code.
  • The Macro Pre-processor: cpp is a powerful text-substitution tool that runs before compilation.
  • Proximity to Hardware: C code maps directly to hardware operations with no hidden mechanisms.

These characteristics make C the choice for operating systems, embedded systems, high-performance computing, and security exploits.

What is Missing

C’s power stems from its simplicity. It lacks many features standard in high-level languages:

  • No Object-Orientation: There are no classes or methods; C is purely procedural.
  • No Managed Types: No built-in string or list types; one must construct data structures from scratch.
  • No Exception Handling: Errors are signaled via return codes (e.g., 0 for success).

The Fundamental Difference: Memory Management

The most critical distinction of C is the absence of automatic memory management.

  • There is no garbage collection.
  • Memory is allocated either on the stack (automatic duration) or the heap.
  • Heap memory must be explicitly allocated and freed.

This manual management is a major source of bugs but also the key to C’s predictable performance.

Success

C is about directly building and manipulating structures in main memory. The mental model is not dealing with abstract objects, but arranging bytes in memory.

Syntax Overview

The syntax of C inspired Java, C++, and C#. Consequently, comments, identifiers, and block structures are familiar. However, differences exist:

  • The list of reserved words differs.
  • The preprocessor performs a separate text-substitution pass before compilation, unlike the directive-based approaches in languages like C#.

Hello, World!

The canonical first program demonstrates the basic structure:

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    printf("hello, world\n");
    return 0;
}
  1. #include <stdio.h>: A preprocessor directive that effectively pastes the contents of the standard input/output header file into the source.
  2. int main(...): The entry point. It receives command-line arguments and returns an integer status code.
  3. printf(...): A standard library function for formatted printing. Newlines \n must be explicit.
  4. return 0;: Exits main with a success status (0).

The C Toolchain

Transforming source code into a running program involves a multi-stage process called the toolchain.

  1. Preprocessing (cpp): Handles directives like #include, producing pure C source.
  2. Compilation (cc1): Translates C into assembly language (.s).
  3. Assembly (as): Translates assembly into machine-code object files (.o).
  4. Linking (ld): Combines object files and libraries into a single executable.
  5. Loading: The OS loader reads the executable into memory at runtime.

The gcc command typically drives this entire process, but flags can stop it at intermediate stages (e.g., -E for preprocessing, -S for assembly).

Summary

C is a systems programming language. Understanding it means understanding the interaction between the program, the compiler, and the underlying system.

Control Flow

Control flow in C serves as the template for most modern languages.

Conditionals

if (boolean_expression) {
    // statement_when_true
} else {
    // statement_when_false
}
 
switch (integer_expression) {
    case CONSTANT_1:
        // statement
        break;
    case CONSTANT_2:
        // statement
        break;
    default:
        // statement
        break;
}
 
return (expression);

Note that switch operates only on integer expressions.

Loops

for (initial; test; increment) {
    // statement
}
 
while (boolean_expression) {
    // statement
}
 
do {
    // statement
} while (boolean_expression);

The for loop is syntactic sugar for while. It is not an iterator-based loop.

Jump Statements

  • break;: Exits the innermost loop or switch.
  • continue;: Skips to the next iteration of the innermost loop.
  • goto Label;: Logical unconditional jump.

Unlike Java, break and continue cannot target specific labels to escape nested loops.

Functions

Functions behave like static methods in Java. main is the special entry point.

// Compute factorial function
// fact(n) = n * (n-1) * ... * 2 * 1
int fact(int n)
{
    if (n == 0) {
        return(1);
    } else {
        return(n * fact(n-1));
    }
}

The arguments to main provide access to the command line:

  • argc: The count of arguments.
  • argv: An array of strings representing the arguments. argv[0] is the program name.

Basic I/O: printf()

printf is a variadic function in the standard library.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    int i = 314;
    const char s[] = "Mothy";
    printf("My name is %s and I work in STF H %d\n", s, i);
    return 0;
}

It uses format specifiers (starting with %) to determine how to format the subsequent arguments. The arguments must match the specifiers in type and order.

The Controversy of goto

The goto statement allows unconditional jumps. Since Dijkstra’s “Go To Statement Considered Harmful,” it has been generally discouraged to prevent “spaghetti code.”

However, in systems programming with C, two specific patterns justify its use:

  1. Breaking out of nested loops. Since break only exits the innermost loop, goto provides a cleaner alternative to boolean flags.

  2. Standardized error handling (Cleanup). When a function performs a sequence of resource allocations, failure at a later step requires undoing previous steps. goto allows jumping to a common error recovery block, avoiding deeply nested if statements.

    This pattern is prevalent in the Linux kernel.

Basic Types

C’s type system maps closely to hardware.

Declarations and Scope

Variable scope is determined by the declaration location.

  • Global: Declared outside functions. Visible to the entire program. static restricts visibility to the file.
  • Local: Declared inside a block. static here implies permanent storage (persistence between calls) rather than visibility scope.

Integers and Floats

Integer sizes in C are implementation defined, which can be problematic.

  • int is usually 32 bits.
  • long is often 64 bits on 64-bit systems.
  • Signedness is default (signed); unsigned must be explicit.

To address ambiguity, C99 introduced <stdint.h>, providing types with explicit widths (e.g., uint32_t, int64_t). Using these is recommended for precise control.

Booleans

Historically, C used integers for booleans (0 is false, non-zero is true). C99 added bool via <stdbool.h>, which essentially wraps the integer behavior.

A common C idiom utilizes the fact that assignments are expressions:

if ((rc = test(atoi(argv[1])))) {
    // rc was assigned, and the result is non-zero (true)
}

Void

The void type indicates no value. It is used for procedures (functions returning nothing) and for void *, which represents a pointer to memory of unspecified type.

Const and Enum

  • const: Marks a variable as read-only. The compiler enforces that it cannot be modified after initialization.
  • enum: Defines a set of named integer constants.
enum { CAB, CNB, OAT } buildings; // CAB=0, CNB=1, OAT=2

Practice: The C Environment

Understanding C is about the relationship between code and the machine.

Exercise: Compilation Stages

Which tool in the toolchain is responsible for resolving the address of a function defined in a different file?

  • Answer: The Linker (ld). The compiler leaves a placeholder in the object file, and the linker finds the actual address during the final assembly of the executable.

Exercise: C Mentality

Why does C not have a built-in string type like Java or Python?

  • Answer: C aims for a minimal runtime. A complex string type would require a garbage collector or a heavy support library. By using character arrays and conventions, C remains extremely lightweight and portable.

Exercise: Boolean Logic

In the following code, what will be printed?

int x = 5;
if (x) {
    printf("Yes");
}
  • Answer: Yes. In C, any non-zero integer (like 5) is evaluated as “true” in a conditional.

Continue here: 03 Operators, Arrays, and the C Preprocessor