Skip to main content

Memory · 35 min

Pointers And Addresses

A pointer is a virtual address with a C type. Pointer arithmetic, null traps, aliasing rules, and callbacks all follow from that pairing.

Why This Matters

On a 64-bit Linux process, int *p = &x usually prints as a value such as 0x7ffc2f8a1b4c. That number is not a DRAM coordinate. It is a virtual address translated by the MMU through page tables before any load or store reaches memory.

A pointer also carries a C type. The numeric address says where the first byte is; the type says how many bytes an expression reads, how arithmetic scales, and which aliases the optimizer may assume cannot exist. A wrong pointer can trap, silently corrupt a heap object, or let the compiler remove a load you expected to happen.

Core Definitions

Definition

Pointer

A pointer value is the address of an object or function, plus a static type such as int *, double *, or void (*)(int). In ordinary user programs that address is a virtual address.

Definition

Object

In C, an object is a region of data storage with a size, alignment, lifetime, and effective type. An int object commonly occupies 4 bytes aligned to a 4-byte boundary, but the exact size is implementation-defined.

Definition

Dereference

The expression *p forms an lvalue naming the object pointed to by p. Evaluating *p for reading or writing has defined behavior only when p points to a live object of a compatible type, or to a permitted aliasing type such as char.

Definition

Null pointer

A null pointer is a pointer value that compares unequal to every valid object and function pointer. The integer constant 0 and NULL convert to a null pointer; the C abstract machine does not require its object representation to be all zero bits.

Addresses, Loads, And Stores

The address-of operator & takes an lvalue and produces a pointer. The dereference operator * goes the other direction.

#include <stdio.h>
#include <stdint.h>

int main(void) {
    int x = 0x01020304;
    int *p = &x;

    printf("&x = %p\n", (void *)&x);
    printf("p  = %p\n", (void *)p);
    printf("*p = 0x%x\n", *p);
}

On a little-endian machine with 4-byte int, the object bytes are laid out with the least significant byte first.

address        byte
0x7ffc1000     04
0x7ffc1001     03
0x7ffc1002     02
0x7ffc1003     01

If p == (int *)0x7ffc1000, then *p reads 4 bytes starting at 0x7ffc1000 and interprets them as an int. A char * to the same address reads one byte.

unsigned char *b = (unsigned char *)&x;
printf("%02x %02x %02x %02x\n", b[0], b[1], b[2], b[3]);

The cast to unsigned char * is special. C permits any object to be inspected through character type pointers. That is the normal way to view raw bytes without violating aliasing rules.

Pointer Arithmetic And Arrays

Pointer arithmetic is scaled by the pointed-to type. If p has type T *, then p + k means the address $p + k \cdot \text{sizeof}(T)$, provided the result stays inside one array object or one element past it.

#include <stdio.h>

int main(void) {
    int a[4] = {10, 20, 30, 40};
    int *p = a;

    printf("%zu\n", sizeof a);      // 16 when int is 4 bytes
    printf("%zu\n", sizeof p);      // 8 on a common x86-64 ABI
    printf("%d\n", *(p + 2));       // 30
}

For int a[4] at virtual address 0x1000, the array bytes are one contiguous object.

element   expression   address range
a[0]      *(a + 0)     0x1000..0x1003
a[1]      *(a + 1)     0x1004..0x1007
a[2]      *(a + 2)     0x1008..0x100b
a[3]      *(a + 3)     0x100c..0x100f
one past  a + 4        0x1010, not dereferenceable

The expression a usually decays to &a[0], an int *. Exceptions include sizeof a, _Alignof, and unary &a. The type of &a is int (*)[4], pointer to an array of four int, so &a + 1 advances by 16 bytes, not by 4.

Function parameters hide this distinction. These two declarations declare the same parameter type.

void f(int p[]);
void f(int *p);

Inside f, sizeof p is the size of a pointer. The array length has been lost unless passed separately.

void sum_bad(int a[]) {
    // On x86-64 this is 8 / 4 = 2, not the caller's array length.
    size_t n = sizeof a / sizeof a[0];
}

void *, Pointer-To-Pointer, And Callbacks

A void * is a generic object pointer. It has no element size, so standard C does not define vp + 1. Cast it to a typed pointer or to unsigned char * before arithmetic.

void *vp = malloc(16);

// Non-standard as C, accepted by some compilers as a byte increment.
// vp = vp + 1;

unsigned char *bp = vp;
bp[0] = 0xaa;
bp[1] = 0xbb;

int *ip = vp;        // conversion from void* to object pointer is implicit in C
*ip = 123;           // requires alignment and enough storage for int

A pointer-to-pointer stores the address of a pointer object. It is common when a function must update the caller's pointer.

#include <stdlib.h>

int grow_to_8(int **out) {
    int *q = malloc(8 * sizeof *q);
    if (!q) return -1;

    for (int i = 0; i < 8; i++) q[i] = i;
    *out = q;                 // writes the caller's pointer variable
    return 0;
}

int main(void) {
    int *p = NULL;
    if (grow_to_8(&p) == 0) {
        free(p);
    }
}

Byte picture, assuming p is stored at address 0x7000 and the heap block starts at 0x5000.

object       address     bytes / value
p            0x7000      00 50 00 00 00 00 00 00
&p           0x7000      type int **
*p via &p    0x5000      type int *

Function pointers hold code addresses, not object addresses. They drive callbacks and are one implementation ingredient for C++ virtual dispatch.

#include <stdio.h>

static int twice(int x) { return 2 * x; }
static int square(int x) { return x * x; }

int apply(int x, int (*fn)(int)) {
    return fn(x);
}

int main(void) {
    int (*f)(int) = square;
    printf("%d\n", apply(7, f));    // 49
    f = twice;
    printf("%d\n", apply(7, f));    // 14
}

Do not cast between object pointers and function pointers in portable C. Some ABIs use the same width for both, but the language does not promise that representation.

Null, Wild, And Dangling Pointers

A null dereference is undefined behavior in C. On mainstream Unix-like systems, the OS leaves page zero unmapped, so a load from address 0x0 raises a page fault that the process sees as SIGSEGV.

int *p = NULL;
int x = *p;       // undefined behavior, commonly a trap

The hardware mechanism is address translation. The CPU asks the MMU to translate virtual page number 0. If the process page table has no present mapping for that page, the access faults before any data byte is returned.

A wild pointer is uninitialized or derived from invalid data.

int *p;           // indeterminate value
*p = 1;           // could hit a mapped page, could trap, always undefined

A dangling pointer once pointed to a live object, but the object's lifetime has ended.

int *bad(void) {
    int x = 12;
    return &x;    // dangling after return
}

int *p = malloc(sizeof *p);
*p = 9;
free(p);
printf("%d\n", *p);   // use after free

After free(p), the allocator may keep the chunk in a free list, hand it to a later allocation, or poison metadata in debug builds. The numeric address in p often remains unchanged, which is why use-after-free bugs can pass tests and fail after a small allocator change.

Strict Aliasing And Optimizer Assumptions

C's effective type rules let the compiler assume that pointers to unrelated non-character types do not name the same object. This is the strict aliasing rule. It allows load reuse and store reordering that would be invalid if every pointer could alias every other pointer.

int f(int *ip, float *fp) {
    *ip = 1;
    *fp = 0.0f;
    return *ip;
}

With strict aliasing, a compiler may return 1 without reloading *ip, because an int * and a float * are assumed not to point at the same object. If the caller passes aliased storage through casts, the program has undefined behavior.

int x;
int r = f(&x, (float *)&x);     // violates aliasing and possibly alignment

A byte-level view through unsigned char * is allowed.

int x = 0x11223344;
unsigned char *c = (unsigned char *)&x;
c[0] = 0x55;                   // defined byte write

For type punning, prefer memcpy. Optimizers recognize fixed-size memcpy and usually compile it to a register move.

#include <string.h>
#include <stdint.h>

float bits_to_float(uint32_t u) {
    float f;
    memcpy(&f, &u, sizeof f);
    return f;
}

Compiling with -fno-strict-aliasing tells GCC and Clang not to make these type-based alias assumptions. It is common in legacy C code that uses casts for type punning. It does not repair dangling pointers, out-of-bounds arithmetic, misalignment, or null dereferences.

The Model

A C pointer expression is governed by two invariants.

p+k(T)((char)p+ksizeof(T))p + k \equiv (T*)((char*)p + k \cdot \text{sizeof}(T))

This address calculation is valid for T *p only inside the same array object, including the one-past value. The one-past value is useful for loop bounds but not for *.

dereference(p) is defined    p names a live object of compatible type and alignment\text{dereference}(p) \text{ is defined} \implies p \text{ names a live object of compatible type and alignment}

The virtual-memory system checks page permissions and mappings. The C abstract machine checks lifetime, bounds, type, and alignment. Passing the hardware check does not make a C access defined.

Example loop over an array.

int a[3] = {5, 6, 7};

for (int *p = a; p != a + 3; p++) {
    printf("%d\n", *p);
}

If a starts at 0x2000, the loop dereferences 0x2000, 0x2004, and 0x2008. It computes 0x200c for the final comparison and stops before dereferencing it.

Common Confusions

Watch Out

`T p[]` in a parameter is not an array object

void g(int p[]) receives an int *. The caller's array does not get copied, and sizeof p is the pointer size. To preserve the bound, pass n or use a pointer to array such as int (*p)[4].

Watch Out

A non-null pointer is not necessarily valid

A dangling pointer after free is usually non-null. An out-of-bounds pointer may numerically point into a mapped page. C validity depends on object lifetime, array bounds, effective type, and alignment, not only on the OS page map.

Watch Out

`void *` is not a byte pointer in ISO C

void * stores a generic object address, but void has no size. Use unsigned char * for byte stepping. Some compilers accept arithmetic on void * as an extension; portable C should not rely on it.

Exercises

ExerciseCore

Problem

Assume a common x86-64 ABI where sizeof(int) == 4 and sizeof(int *) == 8. An int a[5] starts at virtual address 0x4000. Compute the addresses of a, a + 3, &a + 1, and state the values of sizeof a and sizeof &a.

ExerciseCore

Problem

Given this code, identify the defined and undefined accesses. Assume malloc succeeds and int is 4 bytes.

int *p = malloc(2 * sizeof *p);
p[0] = 10;
p[1] = 20;
int *q = p + 2;
int x = *q;
free(p);
int y = p[0];
ExerciseAdvanced

Problem

Explain why the compiler may compile return *ip; as return 1; in the function below when strict aliasing is on.

int f(int *ip, float *fp) {
    *ip = 1;
    *fp = 0.0f;
    return *ip;
}

References

Canonical:

  • Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 3 and ch. 9, especially §3.10 and §9.7-§9.11 — machine-level data, arrays, pointers, and virtual memory
  • Arpaci-Dusseau and Arpaci-Dusseau, Operating Systems: Three Easy Pieces (2018), ch. 13-23 — address spaces, page tables, TLBs, swapping, and paging policy
  • Kernighan and Ritchie, The C Programming Language, 2nd ed. (1988), ch. 5 — pointers, arrays, addresses, and function pointers
  • ISO/IEC 9899:2018, §6.2.4, §6.3.2.3, §6.5, §6.7.6.3 ; object lifetime, pointer conversions, expressions, and function declarators
  • Patterson and Hennessy, Computer Organization and Design RISC-V Edition, 2nd ed. (2020), §5.4 ; virtual memory and address translation

Accessible:

  • cppreference, "Pointer declaration" and "Objects and alignment" ; compact reference for pointer syntax and object rules
  • Beej, Beej's Guide to C Programming, ch. 5 and ch. 10 ; practical pointer examples and memory allocation
  • GCC documentation, "Optimize Options", entries for -fstrict-aliasing and -fno-strict-aliasing ; compiler behavior tied to alias analysis

Next Topics

  • /computationpath/references-and-ownership-in-cpp
  • /computationpath/heap-allocation-and-free
  • /computationpath/virtual-memory-and-page-tables
  • /computationpath/data-alignment-and-padding