Lecture from: 24.09.2025 | Video: Videos ETHZ
Strings
In many modern languages, strings are fundamental, built-in types equipped with methods, automatic memory management, and safety features. C adopts a different philosophy, providing fundamental building blocks and relying on the programmer to assemble them correctly.
The C “String”: A Convention, Not a Type
C lacks a dedicated string type. Instead, it employs a simple convention: a “string” is an array of chars terminated by a specific character, the null byte (written as \0 or simply 0).
This null terminator enables library functions to identify the end of the string. Without it, functions like printf would read memory indefinitely until encountering a zero byte or causing a crash.
The following array initializations are functionally identical:
#include <stdio.h>
int main(int argc, char *argv[]) {
// The string literal is syntactic sugar for the array initialization below.
// The compiler automatically adds the null terminator '\0'.
char s1[6] = "hello";
// Manually creating the same array of characters.
// We must explicitly add the null terminator (0).
char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };
printf("s1 = '%s'\n", s1);
printf("s2 = '%s'\n", s2);
}The string literal "hello" serves as convenient shorthand; the compiler translates it into a character array and appends the \0.
Visualizing String Operations in Memory
Declaring char name1[12]; allocates a contiguous block of 12 bytes. Initially, these contain garbage values.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004115836.png)
A string literal such as "Rosalinda" exists in a read-only data segment as an array of 10 bytes (9 characters plus \0).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004115847.png)
The function strncpy(name1, "Rosalinda", 12); performs the following byte operations:
- It copies characters from the source (
"Rosalinda") to the destination (name1). - It copies the null terminator
\0. - It continues to pad the destination with null bytes until 12 bytes total have been written.
The result in name1 is:
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004115910.png)
This padding ensures the destination buffer remains null-terminated if it is sufficiently large. A more complex sequence involving strncpy and strncat illustrates further behavior.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004115947.png)
strncpy(mixed, name1, 25);
strncat(mixed, " & ", 25);
strncat(mixed, name2, 25);- After
strncpy(mixed, name1, 25);:name1is copied intomixed, padded with\0s./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120141.png)
- After
strncat(mixed, " & ", 25);:strncatlocates the first\0inmixedand begins copying ” & ” from that point./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120307.png)
- After
strncat(mixed, name2, 25);: It finds the new\0and appendsname2./Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120330.png)
strncpyvs.strcpyThe older
strcpyfunction is unsafe as it copies without bounds checking, leading to potential buffer overflows. Always preferstrncpyand safer alternatives.
Assertions
Programmers operate under assumptions (invariants and pre-conditions), such as pointers being non-null or counts being positive. An assertion makes these assumptions explicit. It declares that a condition must be true; otherwise, a logical error exists in the program.
How assert Works
Using <assert.h> allows access to the assert macro:
#include <assert.h>
void array_copy(int a[], int b[], size_t count) {
// It's a bug to call this with null pointers.
// We state this assumption explicitly.
assert(a != NULL);
assert(b != NULL);
for(int i = 0; i < count; i++) {
a[i] = b[i];
}
}The contract of assert is:
- At runtime, the expression is evaluated.
- If true, execution continues.
- If false, the program aborts immediately, printing a diagnostic message (file, line, function) and terminating.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120417.png)
The Philosophy of Assertions
Assertions are for Programmers, Not Users
An assertion failure indicates a bug (an internal contradiction). It is not for handling predictable runtime errors like missing files or invalid input, which should be handled with
if/else. Crashing via assertion is hostile to the user.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120441.png)
assert is implemented as a macro to access the source code string of the expression and the current file/line number. Consequently, the asserted expression must have no side effects (e.g., assert(x++ > 0) is dangerous), as assertions may be disabled by defining NDEBUG.
Style
Good C style prioritizes safety, clarity, and maintainability over extreme cleverness.
Core Principles
- Consistency: Adhere to a set of conventions for indentation, braces, and naming.
- Clarity:
- Use indentation to reveal structure.
- Use parentheses to clarify operator precedence.
- Break complex expressions into named intermediate variables.
- No Magic Numbers: Use
const,enum, or#defineinstead of raw numbers.
Obfuscated Code
C’s flexibility permits code that is valid but nearly unreadable, a quality celebrated by the International Obfuscated C Code Contest (IOCCC).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120510.png)
The above example compiles and generates a maze.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004120520.png)
Tip
Studying obfuscated code can reveal the nuances of syntax and semantics, but clarity should always remain the primary goal in professional development.
Representing Integers
While int represents mathematical integers (), the abstraction is leaky because computer integers are finite. Fixed sizes (like 32 or 64 bits) lead to specific, sometimes surprising, behaviors.
Encoding Integers
A -bit integer is a vector of bits. The interpretation defines the number.
-
Unsigned Integers: Standard binary representation.
-
Signed Integers (Two’s Complement): The most significant bit (MSB) carries a negative weight.
Little vs. Big Endian
Endianness dictates byte order in memory.
- Big Endian: The Most Significant Byte (MSB) is stored at the lowest address (left-to-right).
- Little Endian (x86-64): The Least Significant Byte (LSB) is stored at the lowest address.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004130502.png)
Bit-Level Operations and Shifts
C provides direct bit manipulation operators.
| Operator | Name | Description |
|---|---|---|
&, ||, ^ | Bitwise AND, OR, XOR | Applied bit-by-bit to operands. |
~ | Bitwise NOT | Flips every bit. |
<<, >> | Left Shift, Right Shift | Moves all bits left or right. |
Shift Operations in Detail
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004130734.png)
- Left Shift
x << y: Fills with0s on the right. Multiplies by . - Right Shift
x >> y:- Logical Shift: (Unsigned) Moves bits and fills vacated positions with
0. - Arithmetic Shift: (Signed) Moves bits and fills vacated positions with a copy of the sign bit (sign extension).
- Logical Shift: (Unsigned) Moves bits and fills vacated positions with
Signed vs. Unsigned
The Golden Rule: Implicit casting converts signed to unsigned if an operation mixes types.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004130927.png)
Example: -1 < 0U
- Operands: Signed
int(-1) andunsigned int(0U). -1casts tounsigned intbecomingUMax().- Comparison becomes
UMax < 0, which is false.
Sign Extension
Converting to a larger type preserves the value.
- Unsigned: Zero extension (pads with
0). - Signed: Sign extension (pads with sign bit).
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004131917.png)
Integer Arithmetic
C arithmetic reflects hardware behavior: efficient and predictable but distinct from mathematical integer arithmetic.
Negation and Subtraction
Subtraction x - y is implemented as x + (-y). Negation in two’s complement follows the rule:
This identity () allows the same adder circuit to handle both addition and subtraction.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132116.png)
Unsigned Addition
Addition of -bit unsigned integers discards any carry-out bit.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132227.png)
This is arithmetic modulo :
Visualizing this shows a “wrap around” effect.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132314.png)
Tip
This system forms an Abelian group (closed, commutative, associative, identity, inverse).
Two’s Complement Addition
Hardware uses the same bit-level addition for signed numbers. The difference lies in interpretation and overflow.
- Positive Overflow: Sum of large positive numbers wraps to negative.
- Negative Overflow: Sum of large negative numbers wraps to positive.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132501.png)
Multiplication
Multiplication of -bit numbers yields a -bit result; higher-order bits are truncated.
- Unsigned: Equivalent to .
- Signed: Bit-level operation matches unsigned; interpretation differs.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132641.png)
Power-of-2 Arithmetic
Shifts are faster than multiplication or division and are often used as optimizations.
Multiplication:
Left shifting x << k is equivalent to .
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132729.png)
Unsigned Division:
Logical right shift u >> k corresponds to .
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132806.png)
Signed Division: A discrepancy exists for negative numbers:
- Integer division rounds towards zero.
- Arithmetic right shift rounds towards negative infinity.
Example: -9 / 8 yields -1, but -9 >> 3 yields -2.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004132923.png)
To correct this, the compiler adds a bias of before shifting if the number is negative.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004134129.png)
Integer C Puzzles
Assume 32-bit, two’s complement integers.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251004134220.png)
Puzzle 1: x < 0 ==> ((x*2) < 0)
- False. Fails on overflow. If
xisTMin,x*2overflows to 0.
Puzzle 2: ux >= 0
- True. Unsigned integers are non-negative by definition.
Puzzle 3: x & 7 == 7 ==> (x<<30) < 0
- True.
x & 7 == 7implies the last 3 bits are111. Shifting left by 30 places the bit originally at arithmetic position 1 into the sign bit (31), resulting in a negative number.
Puzzle 4: x > y ==> -x < -y
- False. Fails for
y = TMin.-TMinoverflows toTMin, reversing the expected logic.
Puzzle 5: x * x >= 0
- False. Overflow can result in a negative number (e.g.,
x = 65535).
Puzzle 6: x > 0 && y > 0 ==> x + y > 0
- False. Positive overflow can produce a negative sum.
Puzzle 7: ux >> 3 == ux / 8
- True. Right shift is defined as division for unsigned integers.
Puzzle 8: x >> 3 == x / 8
- False. Different rounding behavior for negative numbers (towards vs. towards 0).
Practice: Integers and Strings
Mastering C requires comfortable movement between bits, bytes, and characters.
Exercise: Two’s Complement Conversion
Convert the 8-bit signed integer 0b11111100 to decimal.
- Identify the sign: MSB is 1, so it is negative.
- Apply formula: .
- Shortcut: ( in decimal). Thus, the value is .
Exercise: String Length
What is the value of sizeof(s) and strlen(s) for char s[] = "CS61";?
- Answer:
sizeof(s)is 5 (includes the null terminator).strlen(s)is 4 (counts characters before the null terminator).
Exercise: Right Shift
What is -8 >> 1 (arithmetic)?
- Answer:
-4.-8is...111000. Shifting right and sign extending gives...111100, which is-4.
Continue here: 05 Memory Segments and C Pointers