Lecture from: 11.11.2025 | Video: Videos ETHZ
Code Vulnerabilities
The focus now shifts from how code is constructed to how code can be broken. Understanding vulnerabilities is essential not just for security professionals, but for anyone writing systems code.
Worms and Viruses
Historically, there is a distinction between two broad categories of malicious code:
- Worm: A program that can run by itself and can propagate a fully working version of itself to other computers.
- Virus: Code that cannot run independently; it must add itself to other programs to execute.
The terminology is somewhat arbitrary, but the concept of self-propagating code dates back to science fiction, specifically John Brunner’s 1975 novel The Shockwave Rider. In the mid-1970s, researchers at Xerox PARC (where personal computing and Ethernet were pioneered) experimented with benign worms to manage distributed systems.
The Morris Worm (1988)
The pivotal moment for malware was November 1988, with the release of the Robert Morris Worm. Robert Morris, a graduate student (and son of an NSA scientist), wrote a program intended to gauge the size of the internet.
- The Intent: It was supposed to copy itself, report back, and stop.
- The Bug: The rate-limiting mechanism failed. It infected machines repeatedly, bringing the internet (then about 60,000 nodes) to a halt.
- The Tech: It was sophisticated. It targeted Sun and VAX machines and used multiple vectors:
- Guessed passwords.
- Exploited a debug feature in
sendmail. - Exploited a buffer overflow in the
fingerd(finger daemon).
Morris became the first person convicted under the Computer Fraud and Abuse Act. He is now a professor at MIT.
Stack Overflow Bugs
The Morris worm utilized a specific vulnerability class that remains relevant today: the Stack Buffer Overflow.
The Vulnerable Function: gets()
Consider the standard C library function gets().
char *gets(char *dest) {
int c = getchar();
char *p = dest;
while (c != EOF && c != '\n') {
*p++ = c; // No check for destination size!
c = getchar();
}
*p = '\0';
return dest;
}The Problem: gets() has no way to know the size of the buffer dest points to. It will happily keep reading and writing past the end of the array until it hits a newline.
Never use
gets()Other functions like
strcpy(copy string),strcat(concatenate), andscanfwith%sshare this same flaw. They do not accept a limit on the number of bytes to write.
Anatomy of an Overflow
A vulnerable program bufdemo is examined.
void echo() {
char buf[4]; /* Way too small! */
gets(buf);
puts(buf);
}Normal Execution: If “123” is input, it fits (3 chars + null terminator).
The Crash: If “12345678…” is input, the program will eventually Segfault. Why?
The Stack Layout:
When echo is called, the stack looks like this:
- Stack Frame for
main(top of memory) - Return Address (points back to instruction in
mainafter the call) - Saved Stack Pointer (optional/compiler dependent)
- Buffer
buf(low memory, grows upward towards the return address)
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251130075808.png)
If too many bytes are written into buf, the saved stack pointer and eventually the Return Address are overwritten.
When the function echo executes the ret instruction, it pops the value currently at the “Return Address” location into the instruction pointer (%rip). If this has been overwritten with garbage (e.g., 0x333231 from the string “123…”), the processor tries to fetch an instruction from that garbage address. This usually causes a Segmentation Fault.
Exploiting Buffer Overflows
A crash is annoying, but an exploit is dangerous.
Code Injection
Instead of crashing the program with random characters, an attacker can:
- Input a string containing executable machine code (exploit code).
- Pad the string to reach the return address location.
- Overwrite the return address with the address of the buffer on the stack.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251130075904.png)
When the function returns, it does not go back to main. It jumps to the stack and executes the injected code. In the case of the Morris Worm attacking fingerd, this code spawned a shell (/bin/sh) running as root.
Another Vector: Integer Overflow (XDR)
Buffer overflows are not limited to the stack or gets. They can happen on the heap, often triggered by integer overflows.
Example: Sun XDR Library The XDR (External Data Representation) library used a function to allocate memory for an array of elements to be copied.
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251130075922.png)
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
// Vulnerable line:
void *result = malloc(ele_cnt * ele_size);
if (result == NULL) return NULL;
// Loop copying data into result...
}The Vulnerability: On a 32-bit machine, if a user provides massive values:
ele_cnt = 2^20 + 1ele_size = 4096()
The multiplication ele_cnt * ele_size theoretically equals . However, in 32-bit arithmetic, overflows to 0. The result is just 4096.
malloc allocates only 4096 bytes. The subsequent loop, however, iterates ele_cnt times (1 million times), copying massive amounts of data into a tiny heap buffer.
Mitigations and Defenses
Writing secure code requires defensiveness at multiple levels.
1. Safer Functions
Replace vulnerable functions with bounds-checking alternatives:
getsfgets(specify size)strcpystrncpyscanf("%s")scanf("%ns")
2. System-Level Protections
Modern operating systems and compilers make exploitation harder (though not impossible):
- ASLR (Address Space Layout Randomization): The stack is placed at a random address each time the program runs. The attacker cannot easily guess the address of their injected buffer.
- NX Bit (No-Execute): Hardware support to mark memory pages (like the stack) as non-executable. If the processor tries to jump to code on the stack, it throws an exception instead of running it.
- Stack Canaries: The compiler inserts a special value (canary) between the buffer and the return address. Before returning, the function checks if the canary is intact. If it has been overwritten, the program aborts immediately.
Return Oriented Programming (ROP)
Attackers have evolved. If the stack is non-executable (NX), they use ROP. Instead of injecting new code, they overwrite the stack with a chain of return addresses that point to existing snippets of code (“gadgets”) already present in the program’s executable memory (like
libc). By chaining these gadgets, they can perform arbitrary computation without injecting a single instruction.
Floating Point
The discussion now leaves the world of malware and enters the world of representing non-integer numbers.
Use: https://float.exposed/ visualize numbers…
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251130082723.png)
Representing Fractional Numbers
How are or represented in a computer? The binary system is extended to the right of the “binary point.”
- represents
- represents
- represents
Limitation: It is only possible to exactly represent numbers of the form . Just as repeats in decimal (), numbers like () repeat in binary ().
IEEE 754 Standard
Before 1985, every computer manufacturer had its own ad-hoc floating point format. This made portable numerical software impossible. The IEEE 754 Standard (1985) unified this. It is supported by all major CPUs.
The Numerical Form
A floating point number represents a value:
- s (Sign): Determines negative or positive.
- M (Significand/Mantissa): A fractional value, usually in the range .
- E (Exponent): Weights the value by a power of two.
Bit Encoding
The bits are divided into three fields:
- s (1 bit): The sign bit.
- exp: Encodes the exponent .
- frac: Encodes the significand .
/Semester-3/Systems-Programming-and-Computer-Architecture/Lecture-Notes/attachments/Pasted-image-20251130080108.png)
Standard Precisions:
- Single Precision (float): 32 bits total. 1 sign, 8 exp, 23 frac. Bias = 127.
- Double Precision (double): 64 bits total. 1 sign, 11 exp, 52 frac. Bias = 1023.
AI and Reduced Precision
Recently, formats like bfloat16 (Brain Floating Point) or FP16 have emerged for Machine Learning. ML models often tolerate lower precision in exchange for higher throughput (more calculations per second) and lower memory usage.
C Data Types
floatis guaranteed to be IEEE Single Precision.doubleis guaranteed to be IEEE Double Precision.- Casting: Casting between
int,float, anddoublechanges the bit representation.inttofloat: Rounds the value (not always exact).doubletoint: Truncates (rounds toward zero).
Practice: Buffer Overflow Mechanics
Understanding how data corrupts the stack is the first step toward writing secure code.
Exercise: Stack Layout Analysis
Consider the following function:
void auth() {
int authenticated = 0;
char password[8];
gets(password);
if (authenticated) {
grant_access();
}
}Scenario: The compiler places authenticated immediately AFTER password on the stack (at a higher address).
- How many bytes are needed to flip the
authenticatedflag?- Answer: 9 bytes. The first 8 fill the buffer, the 9th byte (the null terminator of a 9-character input) will overwrite the first byte of the
authenticatedinteger. If that byte is non-zero, theifcheck will succeed.
- Answer: 9 bytes. The first 8 fill the buffer, the 9th byte (the null terminator of a 9-character input) will overwrite the first byte of the
- How would a Canary detect this?
- Answer: The compiler would place a random value between
passwordand the return address. Upon returning, it checks if that value changed. However, in this specific example, notice thatauthenticatedis before the canary/return address. The canary might not protect local variables from each other, only the return address!
- Answer: The compiler would place a random value between
Categories of Floating Point Values
The interpretation of and depends on the exp field.
1. Normalized Values
This is the most common case. Occurs when exp is neither all 0s nor all 1s.
- Exponent: .
expis treated as an unsigned number and a Bias () is subtracted. This allows negative exponents without a sign bit for the exponent itself.
- Significand: .
- Note: The assumption is that in binary scientific notation, the leading digit is always 1. So it is not stored. An extra bit of precision is gained for free. The
fracfield stores only the fractional part.
- Note: The assumption is that in binary scientific notation, the leading digit is always 1. So it is not stored. An extra bit of precision is gained for free. The
2. Denormalized Values (Denorms)
Occurs when exp is all 0s.
- Exponent: .
- Significand: (Implicit leading 0).
Purpose:
- Representing Zero: Normalized math always assumes a leading 1, so it cannot represent 0. In denorms, if
fracis all 0s, the value is 0.0. Note that because of the sign bit, there are and . - Gradual Underflow: They fill the gap between 0 and the smallest normalized number, allowing for calculations with very small numbers to degrade in precision gracefully rather than snapping instantly to zero.
3. Special Values
Occurs when exp is all 1s.
- Infinity: If
fracis all 0s. Represents overflow (e.g., ). - NaN (Not a Number): If
fracis nonzero. Represents invalid operations (e.g., , ). NaN propagates through calculations.
Rounding
Since the range of real numbers is infinite and bits are finite, rounding is necessary. IEEE 754 defines four modes:
- Round towards 0 (Truncate).
- Round down ().
- Round up ().
- Round to Nearest, Ties to Even (Default).
Why Ties to Even? If 0.5 is always rounded up, statistical bias upwards is introduced. Rounding to the nearest even number (e.g., 1.5 2, 2.5 2) ensures that over many calculations, the rounding errors average out to zero.
FP Arithmetic Properties
Floating point arithmetic does not obey standard algebra rules completely.
- Associativity: is NOT the same as .
- On the left: rounds to (precision loss), then subtracting gives .
- On the right: is , so the result is .
- Conclusion: FP addition and multiplication are not associative.
This makes writing compilers and scientific code tricky, as reordering operations for performance might change the result.
SSE Floating Point
On Intel x86-64 architectures, floating point is handled by the SSE (Streaming SIMD Extensions) unit (specifically SSE3 and later).
- Registers: 16 registers, named
%xmm0through%xmm15. Each is 128 bits wide. - Scalar Operations: The compiler uses these registers for standard scalar
floatanddoublevariables.addss(Add Scalar Single)addsd(Add Scalar Double)
- SIMD (Vector) Operations: Because the registers are 128 bits, they can hold four
floatsor twodoublesat once. Instructions likeaddps(Add Packed Single) can perform four additions in a single cycle.
Vectorization
While GCC can sometimes auto-vectorize loops (using
-O3or-ftree-vectorize), manual vectorization using C intrinsics is often required for maximum performance in numerical code.
Summary
- IEEE 754 is the universal standard.
- Normalized numbers use an implicit leading 1 for precision.
- Denormalized numbers handle the underflow gap near zero.
- Rounding to even prevents statistical bias.
- FP Arithmetic violates associativity, which requires care in scientific computing.
Continue here: 17 Floating Point Rounding and Compiler Basics