Lecture from: 23.09.2025 | Video: Videos ETHZ
This lecture delves into the core mechanics of C: operators, arrays, the preprocessor, and how to structure programs using modularity. These concepts form the bedrock of C programming and its interaction with the underlying system.
Operators
C possesses a rich set of operators, most of which are familiar to users of C-style languages (Java, C++, C#). Their behavior is governed by precedence (which operators are evaluated first) and associativity (the order in which operators of the same precedence are evaluated).
The table below outlines C’s operators from highest to lowest precedence.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916154336.png)
Early Termination (Short-Circuit Evaluation)
The logical operators || (boolean-or) and && (boolean-and) exhibit a special property known as short-circuit evaluation. They do not always evaluate their second operand.
- In the expression
A && B, ifAevaluates to false (0), the entire expression is guaranteed to be false. Consequently,Bis never evaluated. - In the expression
A || B, ifAevaluates to true (non-zero), the entire expression is guaranteed to be true. Consequently,Bis never evaluated.
This behavior is not just an optimization; it is a semantic guarantee often used to guard against errors, such as checking for a null pointer before dereferencing it.
Consider the following example:
#include <stdio.h>
#include <stdbool.h>
bool less_than(int x, int y) {
printf("Checking if %d < %d\n", x, y);
return (x < y);
}
int main(int argc, char *argv[]) {
// This checks if 1 < argc < 4
if (less_than(argc, 4) && less_than(1, argc)) {
printf("Yes, 1 < argc (%d) < 4\n", argc);
}
return 0;
}Running this program demonstrates short-circuiting:
$ gcc -Wall -o early early.c
$ ./early # argc is 1
Checking if 1 < 4
Checking if 1 < 1
$ ./early a # argc is 2
Checking if 2 < 4
Checking if 1 < 2
Yes, 1 < argc (2) < 4
$ ./early a b # argc is 3
Checking if 3 < 4
Checking if 1 < 3
Yes, 1 < argc (3) < 4
$ ./early a b c # argc is 4
Checking if 4 < 4
$When argc is 4, less_than(argc, 4) is false. The && operator terminates evaluation immediately, so the second call less_than(1, argc) never executes.
Ternary Conditional Operator
The ternary operator (? :) offers a compact syntax for if-else expressions.
result = boolean_expr ? result_if_true : result_if_false;
boolean_expris evaluated first.- If true (non-zero),
result_if_trueis evaluated and becomes the result.result_if_falseis ignored. - If false (zero),
result_if_falseis evaluated and becomes the result.result_if_trueis ignored.
It is particularly useful for simple conditional formatting:
#include <stdio.h>
int main(int argc, char *argv[]) {
// If argc is 2, use "", otherwise use "s"
printf("Passed %d argument%s.\n", argc - 1, argc == 2 ? "" : "s");
return 0;
}Assignment Operators
In C, an assignment is an expression, not merely a statement. The value of the expression x = y is the value that was assigned to x. This allows for idioms where assignment and testing happen simultaneously, such as if ((rc = func())).
Compound assignment operators combine an operation with assignment: x += y is shorthand for x = x + y. This applies to most binary operators (-=, *=, /=, %=, <<=, &=, etc.).
Associativity
Associativity dictates grouping for operators of the same precedence.
- Left-to-right:
A + B + Cbecomes(A + B) + C. - Right-to-left:
A += B += CbecomesA += (B += C). This is intuitive for assignment but less common for other operators.
Post-increment and Pre-increment
These operators (i++, ++i) are inherited directly from the addressing modes of the PDP-11 architecture.
- Post-increment (
i++): The expression evaluates to the current value ofi, and theniis incremented. - Pre-increment (
++i):iis incremented first, and the expression evaluates to the new value.
The same logic applies to i-- and --i. These work on integer types and, crucially, on pointers.
Casting
C allows explicit type conversion, or casting, by placing the target type in parentheses: (type)expression.
unsigned int ui = 0xDEADBEEF;
signed int i = (signed int)ui;
// i now has the value -559038737- Casting between integer types of the same size does not change the bit representation; it merely reinterprets the bits.
- Casting between different sizes or between integers and floats changes the representation.
Arrays
An array in C is a simple yet dangerous construct: a finite vector of variables of the same type, stored contiguously in memory. For an N-element array a, indices range from 0 to N-1.
#include <stdio.h>
float data[5]; // data to average and total
float total; // total of the data items
float average; // average of the items
int main() {
data[0] = 34.0;
data[1] = 27.0;
data[2] = 45.0;
data[3] = 82.0;
data[4] = 22.0;
total = data[0] + data[1] + data[2] + data[3] + data[4];
average = total / 5.0;
printf("Total %f Average %f\n", total, average);
return(0);
}Danger
The C compiler does not check array bounds. Writing to
data[5]in a 5-element array is valid syntactically but results in undefined behavior. The program will simply overwrite whatever memory lies adjacent to the array, leading to corruption or security vulnerabilities.
Multi-dimensional Arrays
Multi-dimensional arrays are essentially arrays of arrays. In memory, they are laid out contiguously in row-major order.
For int mat[3][3], the memory layout is:
mat[0][0], mat[0][1], mat[0][2], mat[1][0], mat[1][1], mat[1][2], mat[2][0], mat[2][1], mat[2][2]
This layout has significant performance implications. Iterating through the array sequentially (row by row) matches the memory layout and is cache-friendly. Jumping between rows (column by column) acts as a large stride, causing cache misses and reducing performance.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916170300.png)
Array Initializers
Arrays can be initialized at definition using curly braces:
#include <stdio.h>
int main(int argc, char *argv[]) {
int i, j;
int a[3] = {3, 7, 9};
int b[3][3] = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9},
};
for(i = 0; i < 3; i++) {
printf("a[%d] = %d\n", i, a[i]);
for(j = 0; j < 3; j++) {
printf(" b[%d][%d] = %d\n", i, j, b[i][j]);
}
}
return 0;
}Strings
C does not have a dedicated string type. Instead, a string is a convention: an array of chars terminated by a null byte (0 or '\0').
The following definitions are functionally identical:
// These strings are identical
char s1[6] = "hello";
char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };The string literal "hello" implicitly includes the null terminator, requiring an array of size 6.
String Library Functions
The standard library <string.h> provides functions to manipulate these null-terminated arrays.
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
char name1[12], name2[12];
char mixed[25], title[20];
strncpy(name1, "Rosalinda", 12); // Safe copy
strncpy(name2, "Zeke", 12);
strncpy(title, "This is the title.", 20);
printf(" %s\n\n", title);
printf("Name 1 is %s\n", name1);
printf("Name 2 is %s\n", name2);
// Compare strings
if (strncmp(name1, name2, 12) > 0) {
strncpy(mixed, name1, 25);
} else {
strncpy(mixed, name2, 25);
}
printf("The biggest name alphabetically is %s\n", mixed);
// Concatenate strings
strncpy(mixed, name1, 24);
strncat(mixed, " & ", 24);
strncat(mixed, name2, 24);
printf("Both names are %s\n", mixed);
return 0;
}strncpy(dest, src, n): Safely copies at mostncharacters fromsrctodest.strncmp(s1, s2, n): Compares at mostncharacters ofs1ands2.strncat(dest, src, n): Safely appendssrcto the end ofdest.
Warning
Always use the
nversions of these functions (e.g.,strncpy). The older versions (strcpy,strcat) do not check bounds and are a primary cause of buffer overflow vulnerabilities.
The C Preprocessor
The preprocessor is the initial stage of the toolchain. It performs text transformation on the source code before compilation begins. It serves as the foundation for C’s modularity and enables powerful idioms.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916172539.png)
#include
The #include directive pastes the contents of one file into another.
#include <file.h>: Searches system include paths (e.g.,/usr/include).#include "file.h": Searches the current directory first.
Here is a demonstration. A .c file includes a .h file:
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916172806.png)
When the preprocessor is run (gcc -E), the contents of cpp_example.h are pasted into cpp_example.c, and all macros are expanded. Lines starting with # are markers for the compiler to track original file names and line numbers for error messages.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916172828.png)
Macro Definitions (#define)
Macros allow for token-based text substitution.
#define FOO BAZ
#define BAR(x) (x+3)
...
#undef FOO
#define QUX- Any subsequent occurrence of the token
FOOis replaced with the tokenBAZ. BAR(4)expands to(4+3). The preprocessor does not evaluate the math; it simply substitutes tokens.#undefremoves a macro definition.#define QUXdefinesQUXas an empty string.
Multi-line Macros: Complex macros can span multiple lines using backslashes.
#define SKIP_SPACES(p, limit) \
{ char *lim = (limit); \
while (p < lim) { \
if (*p++ != ' ') { \
p--; break; }}}The “Do-While(0)” Idiom:
To prevent syntax errors when a macro is used in an if statement (the “swallowing the semicolon” problem), macros are often wrapped in a do { ... } while(0) loop. This ensures the macro expands to a single statement that properly consumes the trailing semicolon.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916173706.png)
Preprocessor Conditionals
Code blocks can be conditionally included or excluded, which is essential for cross-platform support.
#if expression
// text 1
#else
// text 2
#endif
#ifdef FOO // Shorthand for #if defined(FOO)
// ...
#endif
#ifndef BAR // Shorthand for #if !defined(BAR)
// ...
#endifThe expression is evaluated by the preprocessor at compile time. It can contain literals, operators, and other macros.
Token Manipulation
- Stringizing (
#): Converts a macro argument into a string literal. - Token Pasting (
##): Concatenates two tokens into a single token.
These are used to reduce boilerplate code, such as automatically generating function names or table entries. For example, a macro can generate a command table:
struct command {
char *name;
void (*function)();
};
struct command commands[] = {
{ "quit", quit_command},
{ "help", help_command},
// ...
};This can be simplified using:
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916174119.png)
#define COMMAND(c) { #c, c ## _command }
#cturnsquitinto"quit".c ## _commandturnsquitinto the single tokenquit_command.
Predefined Macros
The preprocessor provides several useful built-in macros:
__FILE__: The name of the current source file.__LINE__: The current line number in the source file.__DATE__: The compilation date.__TIME__: The compilation time.__STDC__: Defined if this is a standard-compliant compiler.
Modularity
C lacks built-in modules or namespaces. Modularity is achieved through conventions using headers and the linker.
Declarations vs. Definitions
- A declaration introduces a name and its type (e.g., function prototype). It says “this exists somewhere.”
char *strncpy(char *dest, const char *src, size_t n); // A "prototype" - A definition provides the implementation or storage. It says “this is what it is.”
char *strncpy(...) { ... body ... }
Visibility
extern: Promises that a definition exists in another compilation unit. This is the default for functions.static: Restricts visibility to the current compilation unit. The symbol is not exported and cannot be accessed from other files.
This applies to global variables as well:
// In a header file, a declaration might be:
extern const char *banner; // Defined in some other .c file
// In a .c file, a declaration and definition might be:
static int priv_count = 0; // Only in scope in this unit
// In some other .c file, the definition for the extern variable is provided:
const char *banner = "Welcome to Barrelfish";Header Files
The convention for modules involves splitting code into:
- Header file (
.h): The interface. Contains public declarations (prototypes,externvariables, types). - Source file (
.c): The implementation. Contains definitions and private (static) functions.
/Semester-3/Systems-Programming-and-Computer-Architecture/attachments/Pasted-image-20250916174222.png)
- A module
foohas its public interface infoo.h. - Clients of the module
#include "foo.h". foo.hcontains no definitions, only external declarations (function prototypes,externvariables,typedefs).- The implementation is typically in
foo.c. foo.calso includes its own header,foo.h, to allow the compiler to check for consistency between declarations and definitions.foo.ccontains the definitions for the interface functions, plus any internal (static) functions and variables.
The Header Guard Idiom
To prevent compiler errors from including the same header file multiple times, every header must use a guard:
// "file.h":
#ifndef _FILE_H_
#define _FILE_H_
// Declarations...
#endif // _FILE_H_- The first time the preprocessor sees this file,
_FILE_H_is not defined, so it defines it and processes the contents. - The second time it sees this file in the same compilation unit,
_FILE_H_is already defined, so the#ifndefis false, and the preprocessor skips the entire contents. - This ensures the content is processed only once per compilation unit.
Danger
Never
#includea.cfile. Doing so bypasses the separate compilation model and leads to linker errors due to multiple definitions.
Practice: Preprocessors and Logic
C’s preprocessor and logical operators require a firm grasp of evaluation order.
Exercise: Short-Circuiting
What will this code print?
int x = 0;
if (x != 0 && (10 / x > 1)) {
printf("Success\n");
} else {
printf("Failure\n");
}- Answer: Failure. The condition
x != 0is false. Because of short-circuit evaluation, the second part(10 / x > 1)is never evaluated, avoiding a division-by-zero crash.
Exercise: Macro Pitfalls
What is the value of SQUARE(3 + 1) given #define SQUARE(x) x * x?
- Answer: 7. Macro substitution is literal:
3 + 1 * 3 + 1becomes3 + (1 * 3) + 1 = 7. - Lesson: Always wrap macro arguments in parentheses:
#define SQUARE(x) ((x) * (x)).
Exercise: Array Decay
Given int a[5];, what is sizeof(a)?
- Answer: 20 (on a typical system where
sizeof(int) == 4). Even though arrays often decay to pointers,sizeofon the array name itself returns the total size.
Continue here: 04 Strings, Assertions, and Integer Representation