Lecture from: 11.11.2025 | Video: Videos ETHZ

Code Vulnerabilities

The focus now shifts from how code is constructed to how code can be broken. Understanding vulnerabilities is essential not just for security professionals, but for anyone writing systems code.

Worms and Viruses

Historically, there is a distinction between two broad categories of malicious code:

  • Worm: A program that can run by itself and can propagate a fully working version of itself to other computers.
  • Virus: Code that cannot run independently; it must add itself to other programs to execute.

The terminology is somewhat arbitrary, but the concept of self-propagating code dates back to science fiction, specifically John Brunner’s 1975 novel The Shockwave Rider. In the mid-1970s, researchers at Xerox PARC (where personal computing and Ethernet were pioneered) experimented with benign worms to manage distributed systems.

The Morris Worm (1988)

The pivotal moment for malware was November 1988, with the release of the Robert Morris Worm. Robert Morris, a graduate student (and son of an NSA scientist), wrote a program intended to gauge the size of the internet.

  • The Intent: It was supposed to copy itself, report back, and stop.
  • The Bug: The rate-limiting mechanism failed. It infected machines repeatedly, bringing the internet (then about 60,000 nodes) to a halt.
  • The Tech: It was sophisticated. It targeted Sun and VAX machines and used multiple vectors:
  • Guessed passwords.
  • Exploited a debug feature in sendmail.
  • Exploited a buffer overflow in the fingerd (finger daemon).

Morris became the first person convicted under the Computer Fraud and Abuse Act. He is now a professor at MIT.

Stack Overflow Bugs

The Morris worm utilized a specific vulnerability class that remains relevant today: the Stack Buffer Overflow.

The Vulnerable Function: gets()

Consider the standard C library function gets().

char *gets(char *dest) {
    int c = getchar();
    char *p = dest;
    while (c != EOF && c != '\n') {
        *p++ = c; // No check for destination size!
        c = getchar();
    }
    *p = '\0';
    return dest;
}

The Problem: gets() has no way to know the size of the buffer dest points to. It will happily keep reading and writing past the end of the array until it hits a newline.

Never use gets()

Other functions like strcpy (copy string), strcat (concatenate), and scanf with %s share this same flaw. They do not accept a limit on the number of bytes to write.

Anatomy of an Overflow

A vulnerable program bufdemo is examined.

void echo() {
    char buf[4];  /* Way too small! */
    gets(buf);
    puts(buf);
}

Normal Execution: If “123” is input, it fits (3 chars + null terminator).

The Crash: If “12345678…” is input, the program will eventually Segfault. Why?

The Stack Layout: When echo is called, the stack looks like this:

  1. Stack Frame for main (top of memory)
  2. Return Address (points back to instruction in main after the call)
  3. Saved Stack Pointer (optional/compiler dependent)
  4. Buffer buf (low memory, grows upward towards the return address)

If too many bytes are written into buf, the saved stack pointer and eventually the Return Address are overwritten.

When the function echo executes the ret instruction, it pops the value currently at the “Return Address” location into the instruction pointer (%rip). If this has been overwritten with garbage (e.g., 0x333231 from the string “123…”), the processor tries to fetch an instruction from that garbage address. This usually causes a Segmentation Fault.

Exploiting Buffer Overflows

A crash is annoying, but an exploit is dangerous.

Code Injection

Instead of crashing the program with random characters, an attacker can:

  1. Input a string containing executable machine code (exploit code).
  2. Pad the string to reach the return address location.
  3. Overwrite the return address with the address of the buffer on the stack.

When the function returns, it does not go back to main. It jumps to the stack and executes the injected code. In the case of the Morris Worm attacking fingerd, this code spawned a shell (/bin/sh) running as root.

Another Vector: Integer Overflow (XDR)

Buffer overflows are not limited to the stack or gets. They can happen on the heap, often triggered by integer overflows.

Example: Sun XDR Library The XDR (External Data Representation) library used a function to allocate memory for an array of elements to be copied.

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
    // Vulnerable line:
    void *result = malloc(ele_cnt * ele_size);
    if (result == NULL) return NULL;
    
    // Loop copying data into result...
}

The Vulnerability: On a 32-bit machine, if a user provides massive values:

  • ele_cnt = 2^20 + 1
  • ele_size = 4096 ()

The multiplication ele_cnt * ele_size theoretically equals . However, in 32-bit arithmetic, overflows to 0. The result is just 4096. malloc allocates only 4096 bytes. The subsequent loop, however, iterates ele_cnt times (1 million times), copying massive amounts of data into a tiny heap buffer.

Mitigations and Defenses

Writing secure code requires defensiveness at multiple levels.

1. Safer Functions

Replace vulnerable functions with bounds-checking alternatives:

  • gets fgets (specify size)
  • strcpy strncpy
  • scanf("%s") scanf("%ns")

2. System-Level Protections

Modern operating systems and compilers make exploitation harder (though not impossible):

  • ASLR (Address Space Layout Randomization): The stack is placed at a random address each time the program runs. The attacker cannot easily guess the address of their injected buffer.
  • NX Bit (No-Execute): Hardware support to mark memory pages (like the stack) as non-executable. If the processor tries to jump to code on the stack, it throws an exception instead of running it.
  • Stack Canaries: The compiler inserts a special value (canary) between the buffer and the return address. Before returning, the function checks if the canary is intact. If it has been overwritten, the program aborts immediately.

Return Oriented Programming (ROP)

Attackers have evolved. If the stack is non-executable (NX), they use ROP. Instead of injecting new code, they overwrite the stack with a chain of return addresses that point to existing snippets of code (“gadgets”) already present in the program’s executable memory (like libc). By chaining these gadgets, they can perform arbitrary computation without injecting a single instruction.


Floating Point

The discussion now leaves the world of malware and enters the world of representing non-integer numbers.

Use: https://float.exposed/ visualize numbers…

Representing Fractional Numbers

How are or represented in a computer? The binary system is extended to the right of the “binary point.”

  • represents
  • represents
  • represents

Limitation: It is only possible to exactly represent numbers of the form . Just as repeats in decimal (), numbers like () repeat in binary ().

IEEE 754 Standard

Before 1985, every computer manufacturer had its own ad-hoc floating point format. This made portable numerical software impossible. The IEEE 754 Standard (1985) unified this. It is supported by all major CPUs.

The Numerical Form

A floating point number represents a value:

  • s (Sign): Determines negative or positive.
  • M (Significand/Mantissa): A fractional value, usually in the range .
  • E (Exponent): Weights the value by a power of two.

Bit Encoding

The bits are divided into three fields:

  1. s (1 bit): The sign bit.
  2. exp: Encodes the exponent .
  3. frac: Encodes the significand .

Standard Precisions:

  • Single Precision (float): 32 bits total. 1 sign, 8 exp, 23 frac. Bias = 127.
  • Double Precision (double): 64 bits total. 1 sign, 11 exp, 52 frac. Bias = 1023.

AI and Reduced Precision

Recently, formats like bfloat16 (Brain Floating Point) or FP16 have emerged for Machine Learning. ML models often tolerate lower precision in exchange for higher throughput (more calculations per second) and lower memory usage.

C Data Types

  • float is guaranteed to be IEEE Single Precision.
  • double is guaranteed to be IEEE Double Precision.
  • Casting: Casting between int, float, and double changes the bit representation.
    • int to float: Rounds the value (not always exact).
    • double to int: Truncates (rounds toward zero).

Practice: Buffer Overflow Mechanics

Understanding how data corrupts the stack is the first step toward writing secure code.

Exercise: Stack Layout Analysis

Consider the following function:

void auth() {
    int authenticated = 0;
    char password[8];
    gets(password);
    if (authenticated) { 
        grant_access(); 
    }
}

Scenario: The compiler places authenticated immediately AFTER password on the stack (at a higher address).

  1. How many bytes are needed to flip the authenticated flag?
    • Answer: 9 bytes. The first 8 fill the buffer, the 9th byte (the null terminator of a 9-character input) will overwrite the first byte of the authenticated integer. If that byte is non-zero, the if check will succeed.
  2. How would a Canary detect this?
    • Answer: The compiler would place a random value between password and the return address. Upon returning, it checks if that value changed. However, in this specific example, notice that authenticated is before the canary/return address. The canary might not protect local variables from each other, only the return address!

Categories of Floating Point Values

The interpretation of and depends on the exp field.

1. Normalized Values

This is the most common case. Occurs when exp is neither all 0s nor all 1s.

  • Exponent: .
    • exp is treated as an unsigned number and a Bias () is subtracted. This allows negative exponents without a sign bit for the exponent itself.
  • Significand: .
    • Note: The assumption is that in binary scientific notation, the leading digit is always 1. So it is not stored. An extra bit of precision is gained for free. The frac field stores only the fractional part.

2. Denormalized Values (Denorms)

Occurs when exp is all 0s.

  • Exponent: .
  • Significand: (Implicit leading 0).

Purpose:

  1. Representing Zero: Normalized math always assumes a leading 1, so it cannot represent 0. In denorms, if frac is all 0s, the value is 0.0. Note that because of the sign bit, there are and .
  2. Gradual Underflow: They fill the gap between 0 and the smallest normalized number, allowing for calculations with very small numbers to degrade in precision gracefully rather than snapping instantly to zero.

3. Special Values

Occurs when exp is all 1s.

  • Infinity: If frac is all 0s. Represents overflow (e.g., ).
  • NaN (Not a Number): If frac is nonzero. Represents invalid operations (e.g., , ). NaN propagates through calculations.

Rounding

Since the range of real numbers is infinite and bits are finite, rounding is necessary. IEEE 754 defines four modes:

  1. Round towards 0 (Truncate).
  2. Round down ().
  3. Round up ().
  4. Round to Nearest, Ties to Even (Default).

Why Ties to Even? If 0.5 is always rounded up, statistical bias upwards is introduced. Rounding to the nearest even number (e.g., 1.5 2, 2.5 2) ensures that over many calculations, the rounding errors average out to zero.

FP Arithmetic Properties

Floating point arithmetic does not obey standard algebra rules completely.

  • Associativity: is NOT the same as .
    • On the left: rounds to (precision loss), then subtracting gives .
    • On the right: is , so the result is .
    • Conclusion: FP addition and multiplication are not associative.

This makes writing compilers and scientific code tricky, as reordering operations for performance might change the result.

SSE Floating Point

On Intel x86-64 architectures, floating point is handled by the SSE (Streaming SIMD Extensions) unit (specifically SSE3 and later).

  • Registers: 16 registers, named %xmm0 through %xmm15. Each is 128 bits wide.
  • Scalar Operations: The compiler uses these registers for standard scalar float and double variables.
    • addss (Add Scalar Single)
    • addsd (Add Scalar Double)
  • SIMD (Vector) Operations: Because the registers are 128 bits, they can hold four floats or two doubles at once. Instructions like addps (Add Packed Single) can perform four additions in a single cycle.

Vectorization

While GCC can sometimes auto-vectorize loops (using -O3 or -ftree-vectorize), manual vectorization using C intrinsics is often required for maximum performance in numerical code.

Summary

  1. IEEE 754 is the universal standard.
  2. Normalized numbers use an implicit leading 1 for precision.
  3. Denormalized numbers handle the underflow gap near zero.
  4. Rounding to even prevents statistical bias.
  5. FP Arithmetic violates associativity, which requires care in scientific computing.

Continue here: 17 Floating Point Rounding and Compiler Basics