Lecture from: 23.09.2025 | Video: Videos ETHZ

This lecture delves into the core mechanics of C: operators, arrays, the preprocessor, and how to structure programs using modularity. These concepts form the bedrock of C programming and its interaction with the underlying system.

Operators

C possesses a rich set of operators, most of which are familiar to users of C-style languages (Java, C++, C#). Their behavior is governed by precedence (which operators are evaluated first) and associativity (the order in which operators of the same precedence are evaluated).

The table below outlines C’s operators from highest to lowest precedence.

Early Termination (Short-Circuit Evaluation)

The logical operators || (boolean-or) and && (boolean-and) exhibit a special property known as short-circuit evaluation. They do not always evaluate their second operand.

  • In the expression A && B, if A evaluates to false (0), the entire expression is guaranteed to be false. Consequently, B is never evaluated.
  • In the expression A || B, if A evaluates to true (non-zero), the entire expression is guaranteed to be true. Consequently, B is never evaluated.

This behavior is not just an optimization; it is a semantic guarantee often used to guard against errors, such as checking for a null pointer before dereferencing it.

Consider the following example:

#include <stdio.h>
#include <stdbool.h>
 
bool less_than(int x, int y) {
    printf("Checking if %d < %d\n", x, y);
    return (x < y);
}
 
int main(int argc, char *argv[]) {
    // This checks if 1 < argc < 4
    if (less_than(argc, 4) && less_than(1, argc)) {
        printf("Yes, 1 < argc (%d) < 4\n", argc);
    }
    return 0;
}

Running this program demonstrates short-circuiting:

$ gcc -Wall -o early early.c
$ ./early         # argc is 1
Checking if 1 < 4
Checking if 1 < 1
$ ./early a       # argc is 2
Checking if 2 < 4
Checking if 1 < 2
Yes, 1 < argc (2) < 4
$ ./early a b     # argc is 3
Checking if 3 < 4
Checking if 1 < 3
Yes, 1 < argc (3) < 4
$ ./early a b c   # argc is 4
Checking if 4 < 4
$

When argc is 4, less_than(argc, 4) is false. The && operator terminates evaluation immediately, so the second call less_than(1, argc) never executes.

Ternary Conditional Operator

The ternary operator (? :) offers a compact syntax for if-else expressions.

result = boolean_expr ? result_if_true : result_if_false;

  1. boolean_expr is evaluated first.
  2. If true (non-zero), result_if_true is evaluated and becomes the result. result_if_false is ignored.
  3. If false (zero), result_if_false is evaluated and becomes the result. result_if_true is ignored.

It is particularly useful for simple conditional formatting:

#include <stdio.h>
 
int main(int argc, char *argv[]) {
    // If argc is 2, use "", otherwise use "s"
    printf("Passed %d argument%s.\n", argc - 1, argc == 2 ? "" : "s");
    return 0;
}

Assignment Operators

In C, an assignment is an expression, not merely a statement. The value of the expression x = y is the value that was assigned to x. This allows for idioms where assignment and testing happen simultaneously, such as if ((rc = func())).

Compound assignment operators combine an operation with assignment: x += y is shorthand for x = x + y. This applies to most binary operators (-=, *=, /=, %=, <<=, &=, etc.).

Associativity

Associativity dictates grouping for operators of the same precedence.

  • Left-to-right: A + B + C becomes (A + B) + C.
  • Right-to-left: A += B += C becomes A += (B += C). This is intuitive for assignment but less common for other operators.

Post-increment and Pre-increment

These operators (i++, ++i) are inherited directly from the addressing modes of the PDP-11 architecture.

  • Post-increment (i++): The expression evaluates to the current value of i, and then i is incremented.
  • Pre-increment (++i): i is incremented first, and the expression evaluates to the new value.

The same logic applies to i-- and --i. These work on integer types and, crucially, on pointers.

Casting

C allows explicit type conversion, or casting, by placing the target type in parentheses: (type)expression.

unsigned int ui = 0xDEADBEEF;
signed int i = (signed int)ui;
// i now has the value -559038737
  • Casting between integer types of the same size does not change the bit representation; it merely reinterprets the bits.
  • Casting between different sizes or between integers and floats changes the representation.

Arrays

An array in C is a simple yet dangerous construct: a finite vector of variables of the same type, stored contiguously in memory. For an N-element array a, indices range from 0 to N-1.

#include <stdio.h>
 
float data[5]; // data to average and total
float total;   // total of the data items
float average; // average of the items
 
int main() {
    data[0] = 34.0;
    data[1] = 27.0;
    data[2] = 45.0;
    data[3] = 82.0;
    data[4] = 22.0;
 
    total = data[0] + data[1] + data[2] + data[3] + data[4];
    average = total / 5.0;
    printf("Total %f Average %f\n", total, average);
    return(0);
}

Danger

The C compiler does not check array bounds. Writing to data[5] in a 5-element array is valid syntactically but results in undefined behavior. The program will simply overwrite whatever memory lies adjacent to the array, leading to corruption or security vulnerabilities.

Multi-dimensional Arrays

Multi-dimensional arrays are essentially arrays of arrays. In memory, they are laid out contiguously in row-major order.

For int mat[3][3], the memory layout is: mat[0][0], mat[0][1], mat[0][2], mat[1][0], mat[1][1], mat[1][2], mat[2][0], mat[2][1], mat[2][2]

This layout has significant performance implications. Iterating through the array sequentially (row by row) matches the memory layout and is cache-friendly. Jumping between rows (column by column) acts as a large stride, causing cache misses and reducing performance.

Array Initializers

Arrays can be initialized at definition using curly braces:

#include <stdio.h>
 
int main(int argc, char *argv[]) {
    int i, j;
    int a[3] = {3, 7, 9};
    int b[3][3] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9},
    };
 
    for(i = 0; i < 3; i++) {
        printf("a[%d] = %d\n", i, a[i]);
        for(j = 0; j < 3; j++) {
            printf(" b[%d][%d] = %d\n", i, j, b[i][j]);
        }
    }
    return 0;
}

Strings

C does not have a dedicated string type. Instead, a string is a convention: an array of chars terminated by a null byte (0 or '\0').

The following definitions are functionally identical:

// These strings are identical
char s1[6] = "hello";
char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };

The string literal "hello" implicitly includes the null terminator, requiring an array of size 6.

String Library Functions

The standard library <string.h> provides functions to manipulate these null-terminated arrays.

#include <stdio.h>
#include <string.h>
 
int main(int argc, char *argv[]) {
    char name1[12], name2[12];
    char mixed[25], title[20];
 
    strncpy(name1, "Rosalinda", 12); // Safe copy
    strncpy(name2, "Zeke", 12);
    strncpy(title, "This is the title.", 20);
 
    printf(" %s\n\n", title);
    printf("Name 1 is %s\n", name1);
    printf("Name 2 is %s\n", name2);
 
    // Compare strings
    if (strncmp(name1, name2, 12) > 0) {
        strncpy(mixed, name1, 25);
    } else {
        strncpy(mixed, name2, 25);
    }
    printf("The biggest name alphabetically is %s\n", mixed);
 
    // Concatenate strings
    strncpy(mixed, name1, 24);
    strncat(mixed, " & ", 24);
    strncat(mixed, name2, 24);
    printf("Both names are %s\n", mixed);
    return 0;
}
  • strncpy(dest, src, n): Safely copies at most n characters from src to dest.
  • strncmp(s1, s2, n): Compares at most n characters of s1 and s2.
  • strncat(dest, src, n): Safely appends src to the end of dest.

Warning

Always use the n versions of these functions (e.g., strncpy). The older versions (strcpy, strcat) do not check bounds and are a primary cause of buffer overflow vulnerabilities.

The C Preprocessor

The preprocessor is the initial stage of the toolchain. It performs text transformation on the source code before compilation begins. It serves as the foundation for C’s modularity and enables powerful idioms.

#include

The #include directive pastes the contents of one file into another.

  • #include <file.h>: Searches system include paths (e.g., /usr/include).
  • #include "file.h": Searches the current directory first.

Here is a demonstration. A .c file includes a .h file:

When the preprocessor is run (gcc -E), the contents of cpp_example.h are pasted into cpp_example.c, and all macros are expanded. Lines starting with # are markers for the compiler to track original file names and line numbers for error messages.

Macro Definitions (#define)

Macros allow for token-based text substitution.

#define FOO BAZ
#define BAR(x) (x+3)
...
#undef FOO
#define QUX
  • Any subsequent occurrence of the token FOO is replaced with the token BAZ.
  • BAR(4) expands to (4+3). The preprocessor does not evaluate the math; it simply substitutes tokens.
  • #undef removes a macro definition.
  • #define QUX defines QUX as an empty string.

Multi-line Macros: Complex macros can span multiple lines using backslashes.

#define SKIP_SPACES(p, limit) \
{ char *lim = (limit);        \
  while (p < lim) {           \
    if (*p++ != ' ') {        \
      p--; break; }}}

The “Do-While(0)” Idiom: To prevent syntax errors when a macro is used in an if statement (the “swallowing the semicolon” problem), macros are often wrapped in a do { ... } while(0) loop. This ensures the macro expands to a single statement that properly consumes the trailing semicolon.

Preprocessor Conditionals

Code blocks can be conditionally included or excluded, which is essential for cross-platform support.

#if expression
    // text 1
#else
    // text 2
#endif
 
#ifdef FOO      // Shorthand for #if defined(FOO)
    // ...
#endif
 
#ifndef BAR     // Shorthand for #if !defined(BAR)
    // ...
#endif

The expression is evaluated by the preprocessor at compile time. It can contain literals, operators, and other macros.

Token Manipulation

  • Stringizing (#): Converts a macro argument into a string literal.
  • Token Pasting (##): Concatenates two tokens into a single token.

These are used to reduce boilerplate code, such as automatically generating function names or table entries. For example, a macro can generate a command table:

struct command {
    char *name;
    void (*function)();
};
 
struct command commands[] = {
    { "quit", quit_command},
    { "help", help_command},
    // ...
};

This can be simplified using:

#define COMMAND(c) { #c, c ## _command }

  • #c turns quit into "quit".
  • c ## _command turns quit into the single token quit_command.

Predefined Macros

The preprocessor provides several useful built-in macros:

  • __FILE__: The name of the current source file.
  • __LINE__: The current line number in the source file.
  • __DATE__: The compilation date.
  • __TIME__: The compilation time.
  • __STDC__: Defined if this is a standard-compliant compiler.

Modularity

C lacks built-in modules or namespaces. Modularity is achieved through conventions using headers and the linker.

Declarations vs. Definitions

  • A declaration introduces a name and its type (e.g., function prototype). It says “this exists somewhere.” char *strncpy(char *dest, const char *src, size_t n); // A "prototype"
  • A definition provides the implementation or storage. It says “this is what it is.” char *strncpy(...) { ... body ... }

Visibility

  • extern: Promises that a definition exists in another compilation unit. This is the default for functions.
  • static: Restricts visibility to the current compilation unit. The symbol is not exported and cannot be accessed from other files.

This applies to global variables as well:

// In a header file, a declaration might be:
extern const char *banner; // Defined in some other .c file
 
// In a .c file, a declaration and definition might be:
static int priv_count = 0; // Only in scope in this unit
 
// In some other .c file, the definition for the extern variable is provided:
const char *banner = "Welcome to Barrelfish";

Header Files

The convention for modules involves splitting code into:

  1. Header file (.h): The interface. Contains public declarations (prototypes, extern variables, types).
  2. Source file (.c): The implementation. Contains definitions and private (static) functions.

  • A module foo has its public interface in foo.h.
  • Clients of the module #include "foo.h".
  • foo.h contains no definitions, only external declarations (function prototypes, extern variables, typedefs).
  • The implementation is typically in foo.c.
  • foo.c also includes its own header, foo.h, to allow the compiler to check for consistency between declarations and definitions.
  • foo.c contains the definitions for the interface functions, plus any internal (static) functions and variables.

The Header Guard Idiom

To prevent compiler errors from including the same header file multiple times, every header must use a guard:

// "file.h":
#ifndef _FILE_H_
#define _FILE_H_
 
// Declarations...
 
#endif // _FILE_H_
  • The first time the preprocessor sees this file, _FILE_H_ is not defined, so it defines it and processes the contents.
  • The second time it sees this file in the same compilation unit, _FILE_H_ is already defined, so the #ifndef is false, and the preprocessor skips the entire contents.
  • This ensures the content is processed only once per compilation unit.

Danger

Never #include a .c file. Doing so bypasses the separate compilation model and leads to linker errors due to multiple definitions.

Practice: Preprocessors and Logic

C’s preprocessor and logical operators require a firm grasp of evaluation order.

Exercise: Short-Circuiting

What will this code print?

int x = 0;
if (x != 0 && (10 / x > 1)) {
    printf("Success\n");
} else {
    printf("Failure\n");
}
  • Answer: Failure. The condition x != 0 is false. Because of short-circuit evaluation, the second part (10 / x > 1) is never evaluated, avoiding a division-by-zero crash.

Exercise: Macro Pitfalls

What is the value of SQUARE(3 + 1) given #define SQUARE(x) x * x?

  • Answer: 7. Macro substitution is literal: 3 + 1 * 3 + 1 becomes 3 + (1 * 3) + 1 = 7.
  • Lesson: Always wrap macro arguments in parentheses: #define SQUARE(x) ((x) * (x)).

Exercise: Array Decay

Given int a[5];, what is sizeof(a)?

  • Answer: 20 (on a typical system where sizeof(int) == 4). Even though arrays often decay to pointers, sizeof on the array name itself returns the total size.

Continue here: 04 Strings, Assertions, and Integer Representation