04 Strings, Assertions, and Integer Representation

Lecture from: 24.09.2025 | Video: Videos ETHZ

Strings

In many modern languages, strings are fundamental, built-in types equipped with methods, automatic memory management, and safety features. C adopts a different philosophy, providing fundamental building blocks and relying on the programmer to assemble them correctly.

The C “String”: A Convention, Not a Type

C lacks a dedicated string type. Instead, it employs a simple convention: a “string” is an array of chars terminated by a specific character, the null byte (written as \0 or simply 0).

This null terminator enables library functions to identify the end of the string. Without it, functions like printf would read memory indefinitely until encountering a zero byte or causing a crash.

The following array initializations are functionally identical:

#include <stdio.h>
 
int main(int argc, char *argv[]) {
    // The string literal is syntactic sugar for the array initialization below.
    // The compiler automatically adds the null terminator '\0'.
    char s1[6] = "hello";
 
    // Manually creating the same array of characters.
    // We must explicitly add the null terminator (0).
    char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };
 
    printf("s1 = '%s'\n", s1);
    printf("s2 = '%s'\n", s2);
}

The string literal "hello" serves as convenient shorthand; the compiler translates it into a character array and appends the \0.

Visualizing String Operations in Memory

Declaring char name1[12]; allocates a contiguous block of 12 bytes. Initially, these contain garbage values.

A string literal such as "Rosalinda" exists in a read-only data segment as an array of 10 bytes (9 characters plus \0).

The function strncpy(name1, "Rosalinda", 12); performs the following byte operations:

It copies characters from the source ("Rosalinda") to the destination (name1).
It copies the null terminator \0.
It continues to pad the destination with null bytes until 12 bytes total have been written.

The result in name1 is:

This padding ensures the destination buffer remains null-terminated if it is sufficiently large. A more complex sequence involving strncpy and strncat illustrates further behavior.

strncpy(mixed, name1, 25);
strncat(mixed, " & ", 25);
strncat(mixed, name2, 25);

After strncpy(mixed, name1, 25);: name1 is copied into mixed, padded with \0s.
After strncat(mixed, " & ", 25);: strncat locates the first \0 in mixed and begins copying ” & ” from that point.
After strncat(mixed, name2, 25);: It finds the new \0 and appends name2.

strncpy vs. strcpy

The older strcpy function is unsafe as it copies without bounds checking, leading to potential buffer overflows. Always prefer strncpy and safer alternatives.

Assertions

Programmers operate under assumptions (invariants and pre-conditions), such as pointers being non-null or counts being positive. An assertion makes these assumptions explicit. It declares that a condition must be true; otherwise, a logical error exists in the program.

How `assert` Works

Using <assert.h> allows access to the assert macro:

#include <assert.h>
 
void array_copy(int a[], int b[], size_t count) {
    // It's a bug to call this with null pointers.
    // We state this assumption explicitly.
    assert(a != NULL);
    assert(b != NULL);
 
    for(int i = 0; i < count; i++) {
        a[i] = b[i];
    }
}

The contract of assert is:

At runtime, the expression is evaluated.
If true, execution continues.
If false, the program aborts immediately, printing a diagnostic message (file, line, function) and terminating.

The Philosophy of Assertions

Assertions are for Programmers, Not Users

An assertion failure indicates a bug (an internal contradiction). It is not for handling predictable runtime errors like missing files or invalid input, which should be handled with if/else. Crashing via assertion is hostile to the user.

assert is implemented as a macro to access the source code string of the expression and the current file/line number. Consequently, the asserted expression must have no side effects (e.g., assert(x++ > 0) is dangerous), as assertions may be disabled by defining NDEBUG.

Style

Good C style prioritizes safety, clarity, and maintainability over extreme cleverness.

Core Principles

Consistency: Adhere to a set of conventions for indentation, braces, and naming.
Clarity:
- Use indentation to reveal structure.
- Use parentheses to clarify operator precedence.
- Break complex expressions into named intermediate variables.
No Magic Numbers: Use const, enum, or #define instead of raw numbers.

Obfuscated Code

C’s flexibility permits code that is valid but nearly unreadable, a quality celebrated by the International Obfuscated C Code Contest (IOCCC).

The above example compiles and generates a maze.

Tip

Studying obfuscated code can reveal the nuances of syntax and semantics, but clarity should always remain the primary goal in professional development.

Representing Integers

While int represents mathematical integers ( $Z$ ), the abstraction is leaky because computer integers are finite. Fixed sizes (like 32 or 64 bits) lead to specific, sometimes surprising, behaviors.

Encoding Integers

A $w$ -bit integer is a vector of $w$ bits. The interpretation defines the number.

Unsigned Integers: Standard binary representation.
$B 2 U (x) = i = 0 \sum w - 1 x_{i} \cdot 2^{i}$
Signed Integers (Two’s Complement): The most significant bit (MSB) carries a negative weight.
$B 2 T (x) = - x_{w - 1} \cdot 2^{w - 1} + i = 0 \sum w - 2 x_{i} \cdot 2^{i}$

Little vs. Big Endian

Endianness dictates byte order in memory.

Big Endian: The Most Significant Byte (MSB) is stored at the lowest address (left-to-right).
Little Endian (x86-64): The Least Significant Byte (LSB) is stored at the lowest address.

Bit-Level Operations and Shifts

C provides direct bit manipulation operators.

Operator	Name	Description
`&`, `\|\|`, `^`	Bitwise AND, OR, XOR	Applied bit-by-bit to operands.
`~`	Bitwise NOT	Flips every bit.
`<<`, `>>`	Left Shift, Right Shift	Moves all bits left or right.

Shift Operations in Detail

Left Shift x << y: Fills with 0s on the right. Multiplies by $2^{y}$ .
Right Shift x >> y:
- Logical Shift: (Unsigned) Moves bits and fills vacated positions with 0.
- Arithmetic Shift: (Signed) Moves bits and fills vacated positions with a copy of the sign bit (sign extension).

Signed vs. Unsigned

The Golden Rule: Implicit casting converts signed to unsigned if an operation mixes types.

Example: -1 < 0U

Operands: Signed int (-1) and unsigned int (0U).
-1 casts to unsigned int becoming UMax ( $2^{32} - 1$ ).
Comparison becomes UMax < 0, which is false.

Sign Extension

Converting to a larger type preserves the value.

Unsigned: Zero extension (pads with 0).
Signed: Sign extension (pads with sign bit).

Integer Arithmetic

C arithmetic reflects hardware behavior: efficient and predictable but distinct from mathematical integer arithmetic.

Negation and Subtraction

Subtraction x - y is implemented as x + (-y). Negation in two’s complement follows the rule:

- x =\sim x + 1

This identity ( $x + (\sim x) = - 1$ ) allows the same adder circuit to handle both addition and subtraction.

Unsigned Addition

Addition of $w$ -bit unsigned integers discards any carry-out bit.

This is arithmetic modulo $2^{w}$ :

U A d d_{w} (u, v) = (u + v) (mod 2^{w})

Visualizing this shows a “wrap around” effect.

Tip

This system forms an Abelian group (closed, commutative, associative, identity, inverse).

Two’s Complement Addition

Hardware uses the same bit-level addition for signed numbers. The difference lies in interpretation and overflow.

Positive Overflow: Sum of large positive numbers wraps to negative.
Negative Overflow: Sum of large negative numbers wraps to positive.

Multiplication

Multiplication of $w$ -bit numbers yields a $w$ -bit result; higher-order bits are truncated.

Unsigned: Equivalent to $(u \cdot v) (mod 2^{w})$ .
Signed: Bit-level operation matches unsigned; interpretation differs.

Power-of-2 Arithmetic

Shifts are faster than multiplication or division and are often used as optimizations.

Multiplication: Left shifting x << k is equivalent to $x \cdot 2^{k}$ .

x \cdot 12 = x \cdot (8 + 4) = (x ≪ 3) + (x ≪ 2)

Unsigned Division: Logical right shift u >> k corresponds to $⌊ u / 2^{k} ⌋$ .

Signed Division: A discrepancy exists for negative numbers:

Integer division rounds towards zero.
Arithmetic right shift rounds towards negative infinity.

Example: -9 / 8 yields -1, but -9 >> 3 yields -2.

To correct this, the compiler adds a bias of $(1 ≪ k) - 1$ before shifting if the number is negative.

(s < 0)? ((s + (1 ≪ k) - 1) ≫ k) : (s ≫ k)

Integer C Puzzles

Assume 32-bit, two’s complement integers.

Puzzle 1: x < 0 ==> ((x*2) < 0)

False. Fails on overflow. If x is TMin, x*2 overflows to 0.

Puzzle 2: ux >= 0

True. Unsigned integers are non-negative by definition.

Puzzle 3: x & 7 == 7 ==> (x<<30) < 0

True. x & 7 == 7 implies the last 3 bits are 111. Shifting left by 30 places the bit originally at arithmetic position 1 into the sign bit (31), resulting in a negative number.

Puzzle 4: x > y ==> -x < -y

False. Fails for y = TMin. -TMin overflows to TMin, reversing the expected logic.

Puzzle 5: x * x >= 0

False. Overflow can result in a negative number (e.g., x = 65535).

Puzzle 6: x > 0 && y > 0 ==> x + y > 0

False. Positive overflow can produce a negative sum.

Puzzle 7: ux >> 3 == ux / 8

True. Right shift is defined as division for unsigned integers.

Puzzle 8: x >> 3 == x / 8

False. Different rounding behavior for negative numbers (towards $- \infty$ vs. towards 0).

Practice: Integers and Strings

Mastering C requires comfortable movement between bits, bytes, and characters.

Exercise: Two’s Complement Conversion

Convert the 8-bit signed integer 0b11111100 to decimal.

Identify the sign: MSB is 1, so it is negative.
Apply formula: $- 2^{7} + 2^{6} + 2^{5} + 2^{4} + 2^{3} + 2^{2} = - 128 + 64 + 32 + 16 + 8 + 4 = - 4$ .
Shortcut: $\sim (11111100) + 1 = 00000011 + 1 = 00000100$ ( $4$ in decimal). Thus, the value is $- 4$ .

Exercise: String Length

What is the value of sizeof(s) and strlen(s) for char s[] = "CS61";?

Answer: sizeof(s) is 5 (includes the null terminator). strlen(s) is 4 (counts characters before the null terminator).

Exercise: Right Shift

What is -8 >> 1 (arithmetic)?

Answer: -4. -8 is ...111000. Shifting right and sign extending gives ...111100, which is -4.

Continue here: 05 Memory Segments and C Pointers

CS Notes

Explorer

04 Strings, Assertions, and Integer Representation

Strings

The C “String”: A Convention, Not a Type

Visualizing String Operations in Memory

Assertions

How `assert` Works

The Philosophy of Assertions

Style

Core Principles

Obfuscated Code

Representing Integers

Encoding Integers

Little vs. Big Endian

Bit-Level Operations and Shifts

Shift Operations in Detail

Signed vs. Unsigned

Sign Extension

Integer Arithmetic

Negation and Subtraction

Unsigned Addition

Two’s Complement Addition

Multiplication

Power-of-2 Arithmetic

Integer C Puzzles

Practice: Integers and Strings

Exercise: Two’s Complement Conversion

Exercise: String Length

Exercise: Right Shift

Table of Contents

Graph View

Backlinks

CS Notes

Explorer

04 Strings, Assertions, and Integer Representation

Strings

The C “String”: A Convention, Not a Type

Visualizing String Operations in Memory

Assertions

How assert Works

The Philosophy of Assertions

Style

Core Principles

Obfuscated Code

Representing Integers

Encoding Integers

Little vs. Big Endian

Bit-Level Operations and Shifts

Shift Operations in Detail

Signed vs. Unsigned

Sign Extension

Integer Arithmetic

Negation and Subtraction

Unsigned Addition

Two’s Complement Addition

Multiplication

Power-of-2 Arithmetic

Integer C Puzzles

Practice: Integers and Strings

Exercise: Two’s Complement Conversion

Exercise: String Length

Exercise: Right Shift

Table of Contents

Graph View

Backlinks

How `assert` Works