Skip to main content

Memory · 35 min

Stack vs Heap

Two regions, two lifetimes. Stack frames push and pop with calls; heap allocations live until freed. The model behind every undefined-behavior bug.

Why This Matters

A single call instruction on x86-64 pushes an 8-byte return address onto the stack. A single malloc(32) may inspect allocator metadata, round the request size, find or split a free block, update boundary tags, and sometimes ask the kernel for more virtual memory.

The mismatch matters because the two regions encode different lifetimes. A pointer to a stack local becomes invalid when the function returns. A heap pointer remains valid until free, and becomes invalid immediately after. Most C and C++ memory bugs are one of these lifetime errors with a plausible-looking address.

Core Definitions

Definition

Process address space

A process address space is the virtual memory layout seen by one running program. User code usually sees regions for machine instructions, initialized globals, zero-initialized globals, heap allocations, and stack frames.

Definition

Stack frame

A stack frame is the per-call record created for a function invocation. In a conventional compiled C call it contains some local variables, saved registers, spilled temporaries, and a return address used by ret.

Definition

Heap allocation

A heap allocation is a block returned by an allocator such as malloc, calloc, realloc, or C++ new. Its lifetime is independent of the call tree and ends only when the program passes it to the matching deallocation operation.

Definition

Fragmentation

Fragmentation is wasted heap capacity caused by allocator granularity and the placement of live blocks. External fragmentation means the free space exists, but no free run is large enough for the request.

Process Address Space

A simplified 64-bit process layout looks like this, with low addresses at the bottom:

0x0000000000400000  text: machine code, read-only constants
0x0000000000600000  data: initialized globals and statics
0x0000000000610000  BSS: zero-initialized globals and statics
0x0000000000800000  heap: grows upward as allocator obtains pages
        ...
0x00007fffffffe000  stack: grows downward as calls nest

Real systems add shared libraries, memory-mapped files, loader data, randomized base addresses, and per-thread stacks. The simplified layout is still the right first model. Each address is virtual. The operating system and MMU translate it to a physical frame or raise a fault.

Consider this C file:

#include <stdlib.h>

int global_init = 7;          // data
int global_zero;              // BSS
static int static_zero;       // BSS
static int static_init = 9;   // data

int *make_array(int n) {
    int local = 42;           // stack
    int fixed[3] = {1, 2, 3}; // stack
    int *p = malloc(n * sizeof *p); // p on stack, pointed-to block on heap

    if (p) {
        p[0] = local;
        p[1] = fixed[2];
    }
    return p;
}

If make_array(4) is called, local, fixed, and p are stored in the current frame or registers. The heap block returned by malloc contains at least 16 usable bytes. On a little-endian machine the bytes for p[0] = 42 and p[1] = 3 are:

heap address h+0:  2a 00 00 00   int 42
heap address h+4:  03 00 00 00   int 3
heap address h+8:  unspecified until written
heap address h+12: unspecified until written

p itself is not the heap block. It is a local variable whose value is an address. Returning p copies that address back to the caller. Returning &local would copy an address into a dead frame.

Stack Frames and Calls

The stack is a LIFO structure. If main calls f, and f calls g, then g must return before f can return normally. Hardware and calling conventions encode this nesting with a stack pointer register.

A typical x86-64 function compiled with a frame pointer has this shape:

f:
    push rbp          ; save caller frame pointer
    mov  rbp, rsp     ; start this frame
    sub  rsp, 32      ; reserve 32 bytes for locals/spills

    mov  DWORD PTR [rbp-4], 42

    leave             ; mov rsp, rbp; pop rbp
    ret               ; pop return address into rip

Suppose rsp = 0x7fffffffe000 immediately before the caller executes call f, and the return address is 0x000000000040118a.

after call:
0x7fffffffdff8: 8a 11 40 00 00 00 00 00   return address

after push rbp, assuming old rbp = 0x00007fffffffdf20:
0x7fffffffdff0: 20 df ff ff ff 7f 00 00   saved rbp
0x7fffffffdff8: 8a 11 40 00 00 00 00 00   return address

after sub rsp, 32:
0x7fffffffdfd0 .. 0x7fffffffdfef          locals and spills
0x7fffffffdff0: saved rbp
0x7fffffffdff8: return address

The byte order is little-endian. The low byte of 0x40118a appears first. Compilers may omit the frame pointer, keep locals only in registers, or reuse stack slots whose lifetimes do not overlap. The abstract lifetime rule remains: automatic local storage ends when the block or function invocation ends.

A stack allocation is often just pointer arithmetic. Reserving 32 bytes is one sub rsp, 32; releasing it is one add rsp, 32 or leave. This is why small automatic arrays and structs are cheap when the compiler keeps them on the stack.

Heap Allocation and Fragmentation

The heap has no LIFO rule. The program may allocate A, then B, then C, and free B while A and C remain live. That freedom is why heap allocation supports trees, graphs, variable-length tensors, request objects, and buffers whose owners outlive the creating function.

A toy heap with 1024 bytes shows external fragmentation:

start:
[ free 1024 ]

allocate A=256, B=256, C=256, D=256:
[ A 256 ][ B 256 ][ C 256 ][ D 256 ]

free B and D:
[ A 256 ][ free 256 ][ C 256 ][ free 256 ]

The total free space is 512 bytes. A request for 384 bytes fails in this fixed toy heap because the largest contiguous free run is 256 bytes. An allocator with more virtual address space might extend the heap or map a new region, but inside the existing arena the request cannot fit.

Real allocators also add metadata and alignment. If an allocator stores an 8-byte header and rounds payloads to 16-byte alignment, a request for 13 bytes may consume 32 bytes:

8-byte header + 13-byte payload + 11 bytes padding = 32 bytes consumed

The metadata might record the block size and allocation bit:

address h-8:  21 00 00 00 00 00 00 00   size 32 plus alloc bit 1
address h+0:  user payload starts here

This is a model, not a promise about glibc, jemalloc, mimalloc, or the C standard. The standard only specifies the usable behavior of the returned pointer, alignment for suitable object types, and the requirement to release with the matching deallocator.

Stack Overflow and Guard Pages

A stack overflow happens when stack growth reaches an unmapped guard page or collides with another mapped region. Operating systems commonly place a guard page next to a stack so one more push or local allocation produces a page fault instead of silent corruption.

Deep recursion is one cause:

void recurse(int depth) {
    char buf[1024];
    buf[0] = (char)depth;
    recurse(depth + 1);
}

If each call consumes about 1040 bytes after saved registers and alignment, an 8 MiB stack holds roughly:

8×1024×10241040=8065\left\lfloor \frac{8 \times 1024 \times 1024}{1040} \right\rfloor = 8065

frames before other startup frames and guard spacing are counted. The actual crash depth varies with compiler options, optimization, ABI, signal handlers, and operating-system limits.

Huge locals are another cause:

void bad(void) {
    double x[2 * 1024 * 1024]; // 16 MiB if double is 8 bytes
    x[0] = 1.0;
}

On an 8 MiB stack this cannot fit. The same 16 MiB object can live on the heap if allocation succeeds:

double *x = malloc(2 * 1024 * 1024 * sizeof *x);
if (x) {
    x[0] = 1.0;
    free(x);
}

Heap allocation is not free. malloc may touch allocator bins, coalesce adjacent free blocks, split a larger block, acquire a lock, or request pages through mmap or brk. Stack allocation usually changes one register, but the LIFO lifetime is the price.

The Model

Two invariants cover most cases.

For stack frames, lifetimes are properly nested. If call B starts after call A starts, and B uses A as its caller, then B ends before A ends:

start(A)<start(B)    end(B)<end(A)start(A) < start(B) \implies end(B) < end(A)

For heap blocks, lifetime is an interval chosen by the program:

lifetime(block)=[time(malloc),time(free))lifetime(block) = [time(malloc), time(free))

No nesting relation is required. If malloc returns address h, the valid byte addresses for a requested n-byte object are:

h,h+1,,h+n1h, h+1, \ldots, h+n-1

After free(h), using any pointer value that designates that block has undefined behavior in C and C++. The numerical address may still print the same. The lifetime, not the bit pattern, is what changed.

A fragmentation invariant for a fixed arena is:

request(r) succeeds only if max_free_runr+overheadrequest(r)\ succeeds \ only\ if\ max\_free\_run \geq r + overhead

Total free space alone is insufficient. In the earlier 1024-byte heap, total_free = 512, but max_free_run = 256, so a 384-byte request cannot be placed without moving live objects. C allocators do not move allocated objects; tracing garbage collectors in other languages sometimes do, but their algorithms are outside this page.

Common Confusions

Watch Out

The Pointer Variable Is Not the Object It Points To

In int *p = malloc(16);, the variable p has automatic storage if it is a local. The 16-byte block has heap storage. Returning p returns an address value. Returning &p returns the address of the local pointer variable, which dies at function return.

Watch Out

A Printed Address Does Not Prove Validity

A use-after-free pointer may still print as 0x5555555592a0. That does not mean the object exists. The allocator may reuse the same address for a different allocation, so stale writes can corrupt an unrelated object.

Watch Out

Stack Is Not Always Faster for the Whole Program

Creating a stack slot is cheap. Copying a 4 MiB object into that slot is not cheap, and overflowing the stack is a fault. For large or variable-size buffers, heap allocation plus explicit ownership is often the correct engineering choice.

Exercises

ExerciseCore

Problem

Classify each object in this program as text, data, BSS, stack, or heap. Also state which returned pointer is valid after the function returns.

#include <stdlib.h>

int a = 5;
int b;
static int c = 8;
static int d;

int *f(void) {
    int x = 11;
    static int y = 12;
    int *p = malloc(2 * sizeof *p);
    p[0] = x;
    p[1] = y;
    return p;
}
ExerciseCore

Problem

Assume a call starts with rsp = 0x7fffffffe000. The call instruction pushes return address 0x0000000000401234, then the callee executes push rbp with old rbp = 0x00007fffffffdf80, then sub rsp, 16. Give the final rsp and the bytes at the saved return address and saved frame pointer slots.

ExerciseAdvanced

Problem

A toy allocator has a fixed 2048-byte arena, 8-byte headers, and 16-byte alignment for the total block size. It allocates payload requests of 200, 300, 200, and 300 bytes in order, then frees the second and fourth blocks. Compute the consumed block sizes and decide whether a 480-byte payload request fits without extending the arena.

References

Canonical:

  • Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau, Operating Systems: Three Easy Pieces (2018), ch. 13, 14, 15, 17, covers address spaces, memory APIs, translation, and free-space management
  • Randal E. Bryant and David R. O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 3 and ch. 9, covers machine-level stack frames and virtual memory allocators
  • Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language, 2nd ed. (1988), ch. 5 and ch. 8, covers pointers, arrays, storage, and a simple allocator
  • W. Richard Stevens and Stephen A. Rago, Advanced Programming in the UNIX Environment, 3rd ed. (2013), ch. 7, covers process memory layout and C allocation routines
  • Michael Kerrisk, The Linux Programming Interface (2010), ch. 6 and ch. 7, covers process memory layout, heap management, and stack limits

Accessible:

  • Beej Jorgensen, Beej's Guide to C Programming, sections on pointers, arrays, and manual memory allocation
  • cppreference.com, C and C++ pages on storage duration, dynamic allocation, malloc, free, new, and delete
  • Stanford CS107 course notes, guides on x86-64 stack frames and C memory diagrams

Next Topics