Lecture from: 17.09.2025 | Video: Videos ETHZ
Following the introduction to the mindset of a systems programmer, this lecture introduces the primary tool for the course: the C programming language. Despite its age, C remains the lingua franca of systems programming. It offers unparalleled control over the machine, albeit without the safety nets provided by modern managed languages.
History and Toolchain
History
Understanding C requires examining its origins.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916064837.png)
C was developed between 1969 and 1972 by Dennis Ritchie (along with Brian Kernighan) at Bell Labs. It did not appear in a vacuum but evolved through a clear lineage:
- CPL (Combined Programming Language, 1963): A massive, complex language considered “unimplementable.”
- BCPL (Basic CPL, 1967): A radical simplification of CPL, stripping it down to a single data type (the machine word). It functioned essentially as portable assembly.
- B (1969): Ken Thompson’s adaptation of BCPL for the nascent Unix operating system.
- C: Ritchie’s successor to B, which added back a simple type system.
C was highly influenced by the DEC PDP-11 architecture, the machine to which Unix was being ported. Many of C’s quirks are reflections of the PDP-11 instruction set. Despite these specific origins, C was designed for portability, a key factor in the success of both the language and the Unix operating system.
Standards
- K&R C: The original language described in Kernighan and Ritchie’s book, The C Programming Language. At this stage, the compiler source code effectively served as the specification.
- ANSI C (C89/C90): The first formal standard.
- C99: A major update adding many features used in this course.
- C11, C17: More recent updates with minor features and bug fixes.
Enduring Popularity
Decades after its creation, C remains ubiquitous. It consistently ranks at or near the top of indices like TIOBE.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916065058.png)
Its persistence in the face of modern alternatives (Java, C#, Python, Rust) is due to specific trade-offs:
- Speed: A good C compiler generates highly optimized machine code.
- The Macro Pre-processor:
cppis a powerful text-substitution tool that runs before compilation. - Proximity to Hardware: C code maps directly to hardware operations with no hidden mechanisms.
These characteristics make C the choice for operating systems, embedded systems, high-performance computing, and security exploits.
What is Missing
C’s power stems from its simplicity. It lacks many features standard in high-level languages:
- No Object-Orientation: There are no classes or methods; C is purely procedural.
- No Managed Types: No built-in string or list types; one must construct data structures from scratch.
- No Exception Handling: Errors are signaled via return codes (e.g., 0 for success).
The Fundamental Difference: Memory Management
The most critical distinction of C is the absence of automatic memory management.
- There is no garbage collection.
- Memory is allocated either on the stack (automatic duration) or the heap.
- Heap memory must be explicitly allocated and freed.
This manual management is a major source of bugs but also the key to C’s predictable performance.
Success
C is about directly building and manipulating structures in main memory. The mental model is not dealing with abstract objects, but arranging bytes in memory.
Syntax Overview
The syntax of C inspired Java, C++, and C#. Consequently, comments, identifiers, and block structures are familiar. However, differences exist:
- The list of reserved words differs.
- The preprocessor performs a separate text-substitution pass before compilation, unlike the directive-based approaches in languages like C#.
Hello, World!
The canonical first program demonstrates the basic structure:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("hello, world\n");
return 0;
}#include <stdio.h>: A preprocessor directive that effectively pastes the contents of the standard input/output header file into the source.int main(...): The entry point. It receives command-line arguments and returns an integer status code.printf(...): A standard library function for formatted printing. Newlines\nmust be explicit.return 0;: Exitsmainwith a success status (0).
The C Toolchain
Transforming source code into a running program involves a multi-stage process called the toolchain.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916070033.png)
- Preprocessing (
cpp): Handles directives like#include, producing pure C source. - Compilation (
cc1): Translates C into assembly language (.s). - Assembly (
as): Translates assembly into machine-code object files (.o). - Linking (
ld): Combines object files and libraries into a single executable. - Loading: The OS loader reads the executable into memory at runtime.
The gcc command typically drives this entire process, but flags can stop it at intermediate stages (e.g., -E for preprocessing, -S for assembly).
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916070441.png)
Summary
C is a systems programming language. Understanding it means understanding the interaction between the program, the compiler, and the underlying system.
Control Flow
Control flow in C serves as the template for most modern languages.
Conditionals
if (boolean_expression) {
// statement_when_true
} else {
// statement_when_false
}
switch (integer_expression) {
case CONSTANT_1:
// statement
break;
case CONSTANT_2:
// statement
break;
default:
// statement
break;
}
return (expression);Note that switch operates only on integer expressions.
Loops
for (initial; test; increment) {
// statement
}
while (boolean_expression) {
// statement
}
do {
// statement
} while (boolean_expression);The for loop is syntactic sugar for while. It is not an iterator-based loop.
Jump Statements
break;: Exits the innermost loop or switch.continue;: Skips to the next iteration of the innermost loop.goto Label;: Logical unconditional jump.
Unlike Java, break and continue cannot target specific labels to escape nested loops.
Functions
Functions behave like static methods in Java. main is the special entry point.
// Compute factorial function
// fact(n) = n * (n-1) * ... * 2 * 1
int fact(int n)
{
if (n == 0) {
return(1);
} else {
return(n * fact(n-1));
}
}The arguments to main provide access to the command line:
argc: The count of arguments.argv: An array of strings representing the arguments.argv[0]is the program name.
Basic I/O: printf()
printf is a variadic function in the standard library.
#include <stdio.h>
int main(int argc, char *argv[])
{
int i = 314;
const char s[] = "Mothy";
printf("My name is %s and I work in STF H %d\n", s, i);
return 0;
}It uses format specifiers (starting with %) to determine how to format the subsequent arguments. The arguments must match the specifiers in type and order.
The Controversy of goto
The goto statement allows unconditional jumps. Since Dijkstra’s “Go To Statement Considered Harmful,” it has been generally discouraged to prevent “spaghetti code.”
However, in systems programming with C, two specific patterns justify its use:
-
Breaking out of nested loops. Since
breakonly exits the innermost loop,gotoprovides a cleaner alternative to boolean flags./Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916071602.png)
-
Standardized error handling (Cleanup). When a function performs a sequence of resource allocations, failure at a later step requires undoing previous steps.
gotoallows jumping to a common error recovery block, avoiding deeply nestedifstatements./Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916071653.png)
This pattern is prevalent in the Linux kernel.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916071712.png)
Basic Types
C’s type system maps closely to hardware.
Declarations and Scope
Variable scope is determined by the declaration location.
- Global: Declared outside functions. Visible to the entire program.
staticrestricts visibility to the file. - Local: Declared inside a block.
statichere implies permanent storage (persistence between calls) rather than visibility scope.
Integers and Floats
Integer sizes in C are implementation defined, which can be problematic.
intis usually 32 bits.longis often 64 bits on 64-bit systems.- Signedness is default (
signed);unsignedmust be explicit.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916072813.png)
To address ambiguity, C99 introduced <stdint.h>, providing types with explicit widths (e.g., uint32_t, int64_t). Using these is recommended for precise control.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916072959.png)
Booleans
Historically, C used integers for booleans (0 is false, non-zero is true). C99 added bool via <stdbool.h>, which essentially wraps the integer behavior.
A common C idiom utilizes the fact that assignments are expressions:
if ((rc = test(atoi(argv[1])))) {
// rc was assigned, and the result is non-zero (true)
}Void
The void type indicates no value. It is used for procedures (functions returning nothing) and for void *, which represents a pointer to memory of unspecified type.
Const and Enum
const: Marks a variable as read-only. The compiler enforces that it cannot be modified after initialization.enum: Defines a set of named integer constants.
enum { CAB, CNB, OAT } buildings; // CAB=0, CNB=1, OAT=2Practice: The C Environment
Understanding C is about the relationship between code and the machine.
Exercise: Compilation Stages
Which tool in the toolchain is responsible for resolving the address of a function defined in a different file?
- Answer: The Linker (
ld). The compiler leaves a placeholder in the object file, and the linker finds the actual address during the final assembly of the executable.
Exercise: C Mentality
Why does C not have a built-in string type like Java or Python?
- Answer: C aims for a minimal runtime. A complex
stringtype would require a garbage collector or a heavy support library. By using character arrays and conventions, C remains extremely lightweight and portable.
Exercise: Boolean Logic
In the following code, what will be printed?
int x = 5;
if (x) {
printf("Yes");
}- Answer: Yes. In C, any non-zero integer (like 5) is evaluated as “true” in a conditional.
Continue here: 03 Operators, Arrays, and the C Preprocessor