Building an ALU from Gates

Why This Matters

A 32-bit integer add in a simple processor can be built from 32 nearly identical 1-bit circuits. Each slice computes logic functions, a sum bit, and a carry-out bit. A small mux then chooses which value becomes the output bit.

That circuit is on the critical path for common instructions such as add, sub, and, or, slt, address calculation for lw, and branch comparison. Replacing a ripple-carry adder with a carry-lookahead or parallel-prefix adder changes the gate-depth of addition from proportional to $n$ to proportional to $\log n$ , at the cost of more wires and gates.

Core Definitions

Definition

1-bit ALU slice

A 1-bit ALU slice is a combinational circuit that accepts operand bits $a_i$ and $b_i$ , an incoming carry $c_i$ , operation-select lines, and optional inversion controls. It produces one result bit $r_i$ and a carry-out $c_{i+1}$ .

Definition

Full adder

A full adder maps three input bits $a_i$ , $b_i$ , and $c_i$ to a sum bit and carry-out: $s_i = a_i \oplus b_i \oplus c_i$ $c_{i+1} = (a_i b_i) \lor (a_i c_i) \lor (b_i c_i)$

Definition

Two's-complement subtraction

For fixed width $n$ , subtraction is implemented as $A - B = A + \overline{B} + 1 \pmod {2^n}$ . The ALU reuses the same adder by XORing each $b_i$ with a subtract control bit and setting the initial carry-in to $1$ .

Definition

Status flags

Status flags are 1-bit summaries of the ALU result. Common flags are zero $Z$ , sign $N$ , carry $C$ , and overflow $V$ . They are not all about the same number system: $C$ is for unsigned arithmetic, while $V$ is for signed two's-complement arithmetic.

The 1-Bit Slice

A minimal ALU slice computes several candidate outputs in parallel, then uses a mux to pick one. For operation select bits op1 op0, one common mapping is 00 = AND, 01 = OR, 10 = XOR, 11 = SUM.

inputs:  a_i, b_i, c_i, op1, op0
logic:   and_i = a_i & b_i
         or_i  = a_i | b_i
         xor_i = a_i ^ b_i
         sum_i = a_i ^ b_i ^ c_i
mux:     r_i = MUX4(op1 op0, and_i, or_i, xor_i, sum_i)
carry:   c_{i+1} = majority(a_i, b_i, c_i)

For a concrete bit, take $a_i = 1$ , $b_i = 0$ , and $c_i = 1$ .

AND = 0
OR  = 1
XOR = 1
SUM = 0, since 1 ^ 0 ^ 1 = 0
Cout = 1, since two of the three adder inputs are 1

The mux is itself gates. A 4-to-1 mux for one result bit can be written as:

$r_i = (\overline{o_1}\overline{o_0} x_0) \lor (\overline{o_1}o_0 x_1) \lor (o_1\overline{o_0} x_2) \lor (o_1o_0 x_3)$

where $x_0, x_1, x_2, x_3$ are the candidate outputs. This form is wasteful in transistor count compared with tuned cell-library muxes, but it is the right Boolean model.

A small C model makes the slice behavior testable:

#include <stdint.h>

struct SliceOut {
    uint8_t r;
    uint8_t cout;
};

struct SliceOut alu1(uint8_t a, uint8_t b, uint8_t cin, uint8_t op) {
    a &= 1; b &= 1; cin &= 1; op &= 3;

    uint8_t x0 = a & b;
    uint8_t x1 = a | b;
    uint8_t x2 = a ^ b;
    uint8_t x3 = a ^ b ^ cin;

    uint8_t cout = (a & b) | (a & cin) | (b & cin);
    uint8_t table[4] = {x0, x1, x2, x3};

    struct SliceOut out = {table[op], cout};
    return out;
}

The adder carry is usually computed even for logical operations. A real design can gate unused paths, but the conceptual ALU slice is simpler when all candidates are present all the time.

Cascading Slices into an n-Bit ALU

An $n$ -bit ALU places $n$ slices side by side. Slice 0 receives the external carry-in $c_0$ . Slice $i$ receives $c_i$ from slice $i-1$ . The result bits form $R = r_{n-1}\dots r_1r_0$ .

a0,b0,c0 -> slice 0 -> r0,c1
a1,b1,c1 -> slice 1 -> r1,c2
a2,b2,c2 -> slice 2 -> r2,c3
...
a31,b31,c31 -> slice 31 -> r31,c32

For 4-bit addition, compute $0111_2 + 0001_2$ .

i    a_i  b_i  c_i  sum  c_{i+1}
0     1    1    0     0      1
1     1    0    1     0      1
2     1    0    1     0      1
3     0    0    1     1      0

The result is $1000_2$ . As an unsigned 4-bit number this is $8$ , with carry-out $0$ . As a signed 4-bit two's-complement number, $0111_2$ is $7$ and $0001_2$ is $1$ , but $1000_2$ is $-8$ . That is signed overflow, even though there is no unsigned carry-out.

At the byte level, an 8-bit ALU sees only bit patterns:

  0x7f = 0111 1111
+ 0x01 = 0000 0001
-------------------
  0x80 = 1000 0000

Unsigned interpretation gives $127 + 1 = 128$ . Signed two's-complement interpretation gives $127 + 1$ outside the range $[-128,127]$ , so the stored bit pattern wraps to $-128$ and sets overflow.

Subtraction by Inverting B

A separate subtractor is unnecessary. Let sub = 1 for subtraction and sub = 0 for addition. Each slice receives $b'_i = b_i \oplus sub$ , and the low carry-in is $c_0 = sub$ .

sub = 0: b'_i = b_i,       c0 = 0, so A + B
sub = 1: b'_i = not b_i,   c0 = 1, so A + not(B) + 1

Example: compute $5 - 3$ in 4 bits.

A     = 0101
B     = 0011
not B = 1100
+ 1   = 1101
A + not B + 1:
  0101
+ 1101
------
  0010 with carry-out 1

The carry-out is $1$ because no unsigned borrow occurred. Many instruction sets expose this differently. Some architectures define a carry flag after subtraction as "not borrow"; others define a borrow flag. Always read the ISA manual before using condition codes.

Example: compute $3 - 5$ in 4 bits.

A     = 0011
B     = 0101
not B = 1010
+ 1   = 1011
  0011
+ 1011
------
  1110 with carry-out 0

The bit pattern $1110_2$ is unsigned $14$ but signed $-2$ . The missing carry-out means an unsigned borrow happened.

Zero, Sign, Carry, and Overflow Flags

The zero flag is the NOR of every result bit:

$Z = \overline{r_0 \lor r_1 \lor \cdots \lor r_{n-1}}$

The sign flag is just the top result bit:

$N = r_{n-1}$

The carry flag for addition is the carry-out of the top slice:

$C = c_n$

For subtraction implemented as $A + \overline{B} + 1$ , the same $c_n$ means "no borrow" under the common convention used by several processors. Some ISAs invert this meaning in their exposed flags.

Signed overflow is different. For addition, overflow occurs when both inputs have the same sign and the result has the opposite sign:

$V = \overline{(a_{n-1} \oplus b_{n-1})} \land (a_{n-1} \oplus r_{n-1})$

For subtraction $A - B$ , overflow occurs when $A$ and $B$ have different signs and the result sign differs from $A$ :

$V = (a_{n-1} \oplus b_{n-1}) \land (a_{n-1} \oplus r_{n-1})$

The carry-chain view gives a compact test for signed overflow in addition and subtraction after $B$ inversion has already been applied:

$V = c_{n-1} \oplus c_n$

Here $c_{n-1}$ is the carry into the sign bit, and $c_n$ is the carry out of the sign bit.

Worked 4-bit cases:

0111 + 0001 = 1000
carry into sign bit c3 = 1
carry out  of sign bit c4 = 0
V = 1 ^ 0 = 1
C = 0

1111 + 0001 = 0000
unsigned: 15 + 1 wraps to 0, so C = 1
signed: -1 + 1 = 0, so V = 0
carry into sign bit c3 = 1
carry out  of sign bit c4 = 1

These two examples separate the two flags. Carry-out is not a signed-overflow flag.

MIPS R-Type Execution Connection

A MIPS R-type instruction carries two source register numbers, one destination register number, a shift amount, and a function field. For integer ALU operations, the datapath reads two registers, sends their values into the ALU, and writes the ALU result back to the register file.

# $t0 = $t1 + $t2
add $t0, $t1, $t2

# $t0 = $t1 & $t2
and $t0, $t1, $t2

# $t0 = $t1 - $t2
sub $t0, $t1, $t2

For a textbook single-cycle datapath, the main control recognizes an R-type opcode and routes the funct field into ALU control. The ALU control then emits operation-select bits and a subtract control. A compact table is:

funct   operation   sub  op1 op0
0x20    add          0    1   1
0x22    sub          1    1   1
0x24    and          0    0   0
0x25    or           0    0   1
0x26    xor          0    1   0

The same ALU also calculates addresses for load and store instructions. For lw $t0, 12($s1), the ALU adds the base register $s1 and the sign-extended immediate 12. A branch-equal datapath often subtracts the two registers and tests the zero flag.

Adder Performance Families

Ripple-carry is the direct cascade. It has small area and simple wiring, but the worst-case carry must pass through all $n$ slices. With one gate delay for propagate/generate formation and about two gate delays per carry step, a 32-bit ripple adder is roughly linear in 32.

Define per-bit propagate and generate:

$p_i = a_i \oplus b_i$ $g_i = a_i b_i$ $c_{i+1} = g_i \lor (p_i c_i)$

Carry-lookahead expands these equations so carries can be computed in groups. For four bits:

$c_1 = g_0 \lor p_0 c_0$ $c_2 = g_1 \lor p_1 g_0 \lor p_1p_0c_0$ $c_3 = g_2 \lor p_2g_1 \lor p_2p_1g_0 \lor p_2p_1p_0c_0$ $c_4 = g_3 \lor p_3g_2 \lor p_3p_2g_1 \lor p_3p_2p_1g_0 \lor p_3p_2p_1p_0c_0$

Carry-select computes two versions of each block, one assuming carry-in $0$ and one assuming carry-in $1$ , then muxes the right answer when the real carry arrives. It trades extra adders for shorter delay.

Kogge-Stone is a parallel-prefix adder. It combines adjacent generate/propagate pairs into larger ranges with an associative operator:

$(G,P) \circ (g,p) = (G \lor Pg, Pp)$

For 32 bits, an ideal prefix tree needs about $\lceil \log_2 32 \rceil = 5$ prefix levels after local generate/propagate formation. The price is heavy wiring, which matters in physical layout.

Key Result

Proposition

Two's-Complement Subtraction Reuses the Adder

Statement

For every $n$ -bit pattern $A$ and $B$ , feeding $B' = \overline{B}$ and $c_0 = 1$ into an $n$ -bit adder produces the low $n$ bits of $A - B$ . The unsigned carry-out reports no-borrow under the common carry convention, while signed overflow is detected by $c_{n-1} \oplus c_n$ .

Intuition

In $n$ bits, $\overline{B}$ represents $2^n - 1 - B$ . Adding one gives $2^n - B$ . Keeping only the low $n$ bits removes the added $2^n$ , leaving $A - B \pmod {2^n}$ .

Proof Sketch

Treat $A$ and $B$ as integers in $[0,2^n-1]$ . The adder computes $A + (2^n - 1 - B) + 1 = A - B + 2^n$ . Modulo $2^n$ , this equals $A - B$ . For signed overflow, the only invalid cases are adding two effective operands with the same sign and obtaining the opposite sign. In a full-adder chain, that condition is equivalent to the carry into the sign bit differing from the carry out of the sign bit, so $V = c_{n-1} \oplus c_n$ .

Why It Matters

A single adder path implements add, sub, address addition, comparisons through subtraction, and branch equality checks. Hardware is saved because subtraction does not require a second arithmetic circuit.

Failure Mode

Do not use carry-out as signed overflow. In 4 bits, $0111 + 0001 = 1000$ has $C = 0$ and $V = 1$ . Also, $1111 + 0001 = 0000$ has $C = 1$ and $V = 0$ .

report a correction →

Common Confusions

Watch Out

Carry-out is not signed overflow

Carry-out belongs to unsigned arithmetic. Signed overflow depends on whether the mathematical signed result fits in the interval $[-2^{n-1}, 2^{n-1}-1]$ . The two flags agree in some cases by accident, not by definition.

Watch Out

Zero flag is not produced by the carry chain

The zero flag is a reduction over the result bits. A 32-bit ALU computes it with a tree of OR gates followed by inversion, or with a NOR tree. A carry-out of zero does not mean the result is zero.

Watch Out

Subtraction carry convention differs across ISAs

The internal adder emits a carry-out. Whether the programmer-visible flag means carry, not-borrow, or borrow is an ISA convention. The gate-level subtract path stays $A + \overline{B} + 1$ .

Exercises

ExerciseCore

Problem

Build the flags for the 4-bit addition $1011_2 + 0110_2$ . Report the result, $Z$ , $N$ , $C$ , and $V$ .

ExerciseCore

Problem

Use two's-complement subtraction to compute $0100_2 - 0111_2$ in 4 bits. Report the result and the unsigned borrow status under the convention $C = 1$ means no borrow.

ExerciseAdvanced

Problem

For a 16-bit ALU, compare worst-case carry depth for ripple-carry and an ideal Kogge-Stone prefix adder. Count only carry-combine levels after local $p_i,g_i$ formation.

References

Canonical:

Charles Petzold, Code: The Hidden Language of Computer Hardware and Software, 2nd ed. (2022), ch. 10-14, builds adders, binary arithmetic, and machine structure from relays and gates
David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 5th ed. (2014), Appendix B, covers Boolean logic, ALU design, carry-lookahead, and datapath control
John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed. (2019), Appendix A and §3.3, covers instruction execution context and arithmetic datapath costs
J. Clark Scott, But How Do It Know? The Basic Principles of Computers for Everyone (2009), ch. 5-12, gives a gate-to-CPU construction with ALU and control ideas
Neil H. E. Weste and David Money Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. (2011), ch. 11, covers adder circuit families and prefix adders

Accessible:

Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 2, connects integer encodings, overflow, and C-level behavior
MIT OpenCourseWare, 6.004 Computation Structures, arithmetic and ALU lecture notes
University of California Berkeley CS61C notes, single-cycle datapath and ALU control lectures

Next Topics

/computationpath/binary-and-bits
/computationpath/carry-lookahead-adders
/computationpath/single-cycle-datapath
/computationpath/simd-vector-alu
/topics/boolean-algebra