Skip to main content

C++ Systems · 45 min

Undefined Behavior Essentials

Why use-after-free, signed overflow, and uninitialized reads let C++ compilers assume impossible cases away, plus the sanitizer builds that catch them.

Why This Matters

x + 1 > x is compiled as true for signed int on common optimizing C++ compilers, even though two's-complement hardware wraps 2147483647 + 1 to -2147483648. The source program has no defined execution at that overflow point, so the compiler does not have to preserve the wraparound result.

The same rule turns a use-after-free into more than a stale pointer bug. Once a C++ execution performs undefined behavior, the standard places no constraint on the program's behavior. Optimizers, linkers, runtimes, and CPUs do not coordinate to produce a diagnostic. Sanitizers add checks because the language itself does not.

Core Definitions

Definition

Undefined behavior

Undefined behavior, or UB, is behavior for which the C++ standard imposes no requirements. A conforming implementation may compile the program to code that crashes, prints any value, removes branches, or seems to work in one build and fail in another. The compiler may assume UB never occurs in any well-defined execution.

Definition

Implementation-defined behavior

Implementation-defined behavior has a fixed choice made by the implementation and documented by it. For example, the size of long differs across common ABIs, but each compiler and target document the choice.

Definition

Unspecified behavior

Unspecified behavior gives the implementation a choice among valid alternatives, without requiring documentation for each instance. The order of evaluation of some function-call operands has historically fallen into this category. Any chosen alternative must still be a valid C++ execution.

Definition

Indeterminate value

An object can hold an indeterminate value when it has storage but no initialized value. Reading an indeterminate int through an ordinary lvalue-to-rvalue conversion is UB. Reading the bytes through unsigned char is different, because byte inspection is permitted for object representations.

What UB Means To An Optimizer

The optimizer reasons about all executions that the standard defines. If a branch is reachable only after UB, the optimizer may delete it.

#include <limits.h>

int grows(int x) {
    return x + 1 > x;
}

On a 32-bit two's-complement machine, the hardware operation for INT_MAX + 1 would produce 0x80000000, interpreted as -2147483648. But signed overflow is UB in C++. The compiler reasons over defined executions only, where x cannot be INT_MAX at the addition. Therefore x + 1 > x is always true.

A typical optimized body is just:

mov eax, 1
ret

The unsigned version is different.

unsigned grows_u(unsigned x) {
    return x + 1 > x;
}

For x = 0xffffffff, the value wraps to 0x00000000, so the result is false. Unsigned arithmetic is defined modulo 2N2^N. An optimizer must preserve that behavior.

The "time-travel" feeling comes from this rule. A check written after an invalid operation does not repair the program.

int load_after_check(int *p) {
    int x = *p;
    if (p == 0) return 0;
    return x;
}

The dereference *p has UB when p is null. In every defined execution that reaches the if, p is not null. The compiler may remove the null check or move code across it. The check came too late.

A correct shape checks before dereference.

int checked_load(int *p) {
    if (p == 0) return 0;
    return *p;
}

Memory UB At Byte Level

Spatial memory UB means accessing outside the bounds of an object. Here a[4] has valid indices 0 through 3.

int sum_bad(void) {
    int a[4] = {10, 20, 30, 40};
    return a[4];
}

A possible stack layout with 4-byte little-endian int values is:

address     bytes          object
0x7ff0      0a 00 00 00    a[0]
0x7ff4      14 00 00 00    a[1]
0x7ff8      1e 00 00 00    a[2]
0x7ffc      28 00 00 00    a[3]
0x8000      ?? ?? ?? ??    not part of a

Even if 0x8000 happens to contain 99, a[4] does not mean "read the next 4 bytes on the stack" in C++. The source expression is outside the array object.

Temporal memory UB means the object lifetime has ended. A use-after-free often passes small tests because allocators reuse freed chunks.

#include <stdlib.h>

int uaf(void) {
    int *p = malloc(4);
    *p = 0x11223344;
    free(p);

    int *q = malloc(4);
    *q = 0x55667788;

    *p = 0xaabbccdd;
    int r = *q;
    free(q);
    return r;
}

If malloc returns the same address both times, little-endian bytes change like this:

after *p        44 33 22 11
after free      allocator owns the chunk
after *q        88 77 66 55
after stale *p  dd cc bb aa

The final returned value might be 0xaabbccdd in this run. It is still UB at *p = ... after free(p). A double-free is another temporal error because the allocator metadata is updated twice for a lifetime that ended once.

Uninitialized reads are also source-level errors, more than just "random stack bytes."

int maybe_42(void) {
    int x;
    if (x == 42) return 1;
    return 0;
}

The stack slot might contain bytes 2a 00 00 00, but the program has no initialized int object value in x. MemorySanitizer tracks such uninitialized flows instead of treating the bytes as a lottery.

Integer Pointer And Concurrency UB

Signed overflow is one of several arithmetic traps.

int bad_shift(int n) {
    return 1 << n;
}

For 32-bit int, n = 31 tries to create 0x80000000 as a signed value, and n = 32 uses a shift count outside the width. Both are UB in C++. For unsigned arithmetic, use an unsigned operand and still check the shift count.

unsigned ok_shift(unsigned n) {
    if (n >= 32) return 0;
    return 1u << n;
}

Strict aliasing is a pointer rule. An object may be accessed through its own type, some related types, and character types. Reinterpreting storage through an unrelated pointer type can let the optimizer cache values that source code seems to overwrite.

float clobber(int *ip, float *fp) {
    *ip = 0;
    *fp = 1.0f;
    return *fp;
}

If ip and fp point to the same 4 bytes, the program violates the aliasing assumptions unless the storage really permits those accesses. To inspect representation bytes, use memcpy or unsigned char, not a type-punned pointer.

#include <string.h>
#include <stdint.h>

uint32_t bits_of_float(float f) {
    uint32_t u;
    memcpy(&u, &f, 4);
    return u;
}

For f = 1.0f, IEEE 754 single precision bytes are typically 00 00 80 3f on little-endian machines, and the returned integer is 0x3f800000.

Data races are UB in C++. Two threads race when they access the same scalar object, at least one access is a write, and no happens-before ordering exists.

int counter;

void worker(void) {
    for (int i = 0; i < 1000000; ++i) {
        counter++;
    }
}

Two threads running worker do not merely risk losing increments. The program has UB because counter++ expands to load, add, store without atomic ordering. Use std::atomic<int> in C++ or a mutex.

Sanitizers And Hardening Builds

Sanitizers compile extra checks into the program. They change layout, timing, and ABI details, so use them for testing and fuzzing builds rather than assuming a sanitized binary behaves exactly like production.

AddressSanitizer detects many spatial and temporal memory errors. It poisons red zones around objects and quarantines freed chunks.

#include <stdlib.h>

int main(void) {
    int *p = malloc(4);
    free(p);
    return *p;
}

Build:

clang -O1 -g -fsanitize=address uaf.c

A typical report names heap-use-after-free, gives the allocation and free stack traces, and prints shadow-byte context. The exact shadow encoding is implementation detail, but the idea is simple: application bytes map to metadata bytes that say whether the address is addressable.

UndefinedBehaviorSanitizer catches categories such as signed overflow, invalid shifts, null member access, some alignment violations, and invalid enum values.

clang++ -O1 -g -fsanitize=undefined ub.cc

ThreadSanitizer instruments memory accesses and synchronization to detect data races. Build it separately from AddressSanitizer in ordinary workflows.

clang++ -O1 -g -fsanitize=thread race.cc

MemorySanitizer tracks uninitialized values. It works best when all code in the process, including libraries, is instrumented.

clang++ -O1 -g -fsanitize=memory uninit.cc

Hardening flags change failure modes or add local checks. -ftrapv traps some signed overflows at runtime. -fwrapv tells GCC and Clang to treat signed overflow as two's-complement wrap for optimization purposes. -fstack-protector-strong inserts canaries for many stack frames with arrays or address-taken locals. _FORTIFY_SOURCE=2 adds checked wrappers for selected C library calls when object sizes are known.

Modern C++ reduces exposure by making bounds and lifetimes explicit. Use vector::at for checked indexing in debug paths. Pass contiguous ranges as std::span so pointer plus length travel together. Prefer owning containers over raw new and delete. Fuzzing with sanitizer builds turns rare UB paths into reproducible crashes.

The Model

The operational rule is short.

If execution reaches UB, the source program stops constraining the implementation. Optimizers reason as if every executed operation is defined.

A useful invariant is:

defined execution    no signed overflow, no out-of-bounds access, no invalid lifetime access, no data race\text{defined execution} \implies \text{no signed overflow, no out-of-bounds access, no invalid lifetime access, no data race}

For 32-bit signed addition, the compiler may use:

2147483648x2147483646    x+1>x-2147483648 \leq x \leq 2147483646 \implies x + 1 > x

It need not preserve any behavior for x = 2147483647 in x + 1, because that execution is absent from the defined set.

Security follows from the same gap. Many CVE-class memory corruption bugs start as C or C++ UB: out-of-bounds writes overwrite control data, use-after-free writes corrupt a reused object, and uninitialized reads expose stale data. The machine executes bytes, but the language contract has already been broken.

Common Confusions

Watch Out

The CPU wrapped, so C++ must wrap

The CPU instruction may wrap, but the source expression is signed overflow before it becomes a hardware instruction. C++ defines unsigned wrap modulo 2N2^N and leaves signed overflow undefined. Use unsigned types, checked arithmetic, or compiler flags such as -fwrapv when wraparound is part of the specification.

Watch Out

A null check after dereference protects the dereference

A null check must dominate the dereference. If *p happens first, the p == 0 case has already invoked UB. The optimizer may delete the later check because every defined execution reaching it has non-null p.

Watch Out

Sanitizer clean means UB free

A sanitizer checks a selected subset of executions. ASan will not prove absence of data races. TSan will not prove absence of signed overflow. Unexecuted paths are unchecked unless tests or fuzzers reach them.

Exercises

ExerciseCore

Problem

For this function, give the return value for x = 2147483647 under two models: normal optimized C++ with signed-overflow UB, and a build where signed overflow is specified to wrap with -fwrapv.

int f(int x) {
    if (x + 1 < x) return 7;
    return 9;
}
ExerciseCore

Problem

Classify each bug by sanitizer. Choose from ASan, UBSan, TSan, and MSan. More than one answer may be useful for a full test build, but give the most direct one.

int a[2];
int x = a[2];

int y;
if (y) puts("yes");

int z = 1 << 32;

global_counter++;

Assume global_counter++ is run concurrently by two threads without synchronization.

ExerciseAdvanced

Problem

A 4-byte heap chunk at address 0x1000 is allocated as int *p, written with 0x01020304, freed, then immediately reallocated as int *q at the same address and written with 0x0a0b0c0d. The program then writes 0x11121314 through p. Give the byte sequence at 0x1000 after each write on a little-endian machine, and name the first UB operation.

References

Canonical:

  • Bjarne Stroustrup, A Tour of C++, 3rd ed. (2022), ch. 1 and ch. 3, overview of C++ objects, types, and resource management
  • Bjarne Stroustrup, The C++ Programming Language, 4th ed. (2013), ch. 6 and ch. 7, types, pointers, arrays, and object access rules
  • Randal E. Bryant and David R. O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), §3.10 and ch. 9, machine-level memory errors and virtual memory context
  • ISO/IEC 14882:2020, Programming Languages C++, §4.1, §6.7, §6.8, §6.9, implementation categories, object lifetime, memory model, and basic types
  • Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov, "AddressSanitizer: A Fast Address Sanity Checker" (USENIX ATC 2012), design of red zones and shadow memory

Accessible:

  • John Regehr, "A Guide to Undefined Behavior in C and C++", practical examples of optimizer-visible UB
  • Chris Lattner, "What Every C Programmer Should Know About Undefined Behavior", LLVM Project Blog series
  • ISO C++ Core Guidelines, profiles on bounds, lifetime, and type safety rules

Next Topics

  • /computationpath/cpp-object-lifetimes
  • /computationpath/sanitizer-driven-testing
  • /computationpath/memory-allocators-and-fragmentation
  • /computationpath/data-races-and-atomics