Skip to main content

Logic Gates · 35 min

Building An ALU From Gates

How a chip-scale arithmetic-logic unit is assembled from 1-bit slices, carry chains, two's-complement subtraction, flags, and operation-select muxes.

Why This Matters

A 32-bit integer add in a simple processor can be built from 32 nearly identical 1-bit circuits. Each slice computes logic functions, a sum bit, and a carry-out bit. A small mux then chooses which value becomes the output bit.

That circuit is on the critical path for common instructions such as add, sub, and, or, slt, address calculation for lw, and branch comparison. Replacing a ripple-carry adder with a carry-lookahead or parallel-prefix adder changes the gate-depth of addition from proportional to nn to proportional to logn\log n, at the cost of more wires and gates.

Core Definitions

Definition

1-bit ALU slice

A 1-bit ALU slice is a combinational circuit that accepts operand bits aia_i and bib_i, an incoming carry cic_i, operation-select lines, and optional inversion controls. It produces one result bit rir_i and a carry-out ci+1c_{i+1}.

Definition

Full adder

A full adder maps three input bits aia_i, bib_i, and cic_i to a sum bit and carry-out: si=aibicis_i = a_i \oplus b_i \oplus c_i ci+1=(aibi)(aici)(bici)c_{i+1} = (a_i b_i) \lor (a_i c_i) \lor (b_i c_i)

Definition

Two's-complement subtraction

For fixed width nn, subtraction is implemented as AB=A+B+1(mod2n)A - B = A + \overline{B} + 1 \pmod {2^n}. The ALU reuses the same adder by XORing each bib_i with a subtract control bit and setting the initial carry-in to 11.

Definition

Status flags

Status flags are 1-bit summaries of the ALU result. Common flags are zero ZZ, sign NN, carry CC, and overflow VV. They are not all about the same number system: CC is for unsigned arithmetic, while VV is for signed two's-complement arithmetic.

The 1-Bit Slice

A minimal ALU slice computes several candidate outputs in parallel, then uses a mux to pick one. For operation select bits op1 op0, one common mapping is 00 = AND, 01 = OR, 10 = XOR, 11 = SUM.

inputs:  a_i, b_i, c_i, op1, op0
logic:   and_i = a_i & b_i
         or_i  = a_i | b_i
         xor_i = a_i ^ b_i
         sum_i = a_i ^ b_i ^ c_i
mux:     r_i = MUX4(op1 op0, and_i, or_i, xor_i, sum_i)
carry:   c_{i+1} = majority(a_i, b_i, c_i)

For a concrete bit, take ai=1a_i = 1, bi=0b_i = 0, and ci=1c_i = 1.

AND = 0
OR  = 1
XOR = 1
SUM = 0, since 1 ^ 0 ^ 1 = 0
Cout = 1, since two of the three adder inputs are 1

The mux is itself gates. A 4-to-1 mux for one result bit can be written as:

ri=(o1o0x0)(o1o0x1)(o1o0x2)(o1o0x3)r_i = (\overline{o_1}\overline{o_0} x_0) \lor (\overline{o_1}o_0 x_1) \lor (o_1\overline{o_0} x_2) \lor (o_1o_0 x_3)

where x0,x1,x2,x3x_0, x_1, x_2, x_3 are the candidate outputs. This form is wasteful in transistor count compared with tuned cell-library muxes, but it is the right Boolean model.

A small C model makes the slice behavior testable:

#include <stdint.h>

struct SliceOut {
    uint8_t r;
    uint8_t cout;
};

struct SliceOut alu1(uint8_t a, uint8_t b, uint8_t cin, uint8_t op) {
    a &= 1; b &= 1; cin &= 1; op &= 3;

    uint8_t x0 = a & b;
    uint8_t x1 = a | b;
    uint8_t x2 = a ^ b;
    uint8_t x3 = a ^ b ^ cin;

    uint8_t cout = (a & b) | (a & cin) | (b & cin);
    uint8_t table[4] = {x0, x1, x2, x3};

    struct SliceOut out = {table[op], cout};
    return out;
}

The adder carry is usually computed even for logical operations. A real design can gate unused paths, but the conceptual ALU slice is simpler when all candidates are present all the time.

Cascading Slices into an n-Bit ALU

An nn-bit ALU places nn slices side by side. Slice 0 receives the external carry-in c0c_0. Slice ii receives cic_i from slice i1i-1. The result bits form R=rn1r1r0R = r_{n-1}\dots r_1r_0.

a0,b0,c0 -> slice 0 -> r0,c1
a1,b1,c1 -> slice 1 -> r1,c2
a2,b2,c2 -> slice 2 -> r2,c3
...
a31,b31,c31 -> slice 31 -> r31,c32

For 4-bit addition, compute 01112+000120111_2 + 0001_2.

i    a_i  b_i  c_i  sum  c_{i+1}
0     1    1    0     0      1
1     1    0    1     0      1
2     1    0    1     0      1
3     0    0    1     1      0

The result is 100021000_2. As an unsigned 4-bit number this is 88, with carry-out 00. As a signed 4-bit two's-complement number, 011120111_2 is 77 and 000120001_2 is 11, but 100021000_2 is 8-8. That is signed overflow, even though there is no unsigned carry-out.

At the byte level, an 8-bit ALU sees only bit patterns:

  0x7f = 0111 1111
+ 0x01 = 0000 0001
-------------------
  0x80 = 1000 0000

Unsigned interpretation gives 127+1=128127 + 1 = 128. Signed two's-complement interpretation gives 127+1127 + 1 outside the range [128,127][-128,127], so the stored bit pattern wraps to 128-128 and sets overflow.

Subtraction by Inverting B

A separate subtractor is unnecessary. Let sub = 1 for subtraction and sub = 0 for addition. Each slice receives bi=bisubb'_i = b_i \oplus sub, and the low carry-in is c0=subc_0 = sub.

sub = 0: b'_i = b_i,       c0 = 0, so A + B
sub = 1: b'_i = not b_i,   c0 = 1, so A + not(B) + 1

Example: compute 535 - 3 in 4 bits.

A     = 0101
B     = 0011
not B = 1100
+ 1   = 1101
A + not B + 1:
  0101
+ 1101
------
  0010 with carry-out 1

The carry-out is 11 because no unsigned borrow occurred. Many instruction sets expose this differently. Some architectures define a carry flag after subtraction as "not borrow"; others define a borrow flag. Always read the ISA manual before using condition codes.

Example: compute 353 - 5 in 4 bits.

A     = 0011
B     = 0101
not B = 1010
+ 1   = 1011
  0011
+ 1011
------
  1110 with carry-out 0

The bit pattern 111021110_2 is unsigned 1414 but signed 2-2. The missing carry-out means an unsigned borrow happened.

Zero, Sign, Carry, and Overflow Flags

The zero flag is the NOR of every result bit:

Z=r0r1rn1Z = \overline{r_0 \lor r_1 \lor \cdots \lor r_{n-1}}

The sign flag is just the top result bit:

N=rn1N = r_{n-1}

The carry flag for addition is the carry-out of the top slice:

C=cnC = c_n

For subtraction implemented as A+B+1A + \overline{B} + 1, the same cnc_n means "no borrow" under the common convention used by several processors. Some ISAs invert this meaning in their exposed flags.

Signed overflow is different. For addition, overflow occurs when both inputs have the same sign and the result has the opposite sign:

V=(an1bn1)(an1rn1)V = \overline{(a_{n-1} \oplus b_{n-1})} \land (a_{n-1} \oplus r_{n-1})

For subtraction ABA - B, overflow occurs when AA and BB have different signs and the result sign differs from AA:

V=(an1bn1)(an1rn1)V = (a_{n-1} \oplus b_{n-1}) \land (a_{n-1} \oplus r_{n-1})

The carry-chain view gives a compact test for signed overflow in addition and subtraction after BB inversion has already been applied:

V=cn1cnV = c_{n-1} \oplus c_n

Here cn1c_{n-1} is the carry into the sign bit, and cnc_n is the carry out of the sign bit.

Worked 4-bit cases:

0111 + 0001 = 1000
carry into sign bit c3 = 1
carry out  of sign bit c4 = 0
V = 1 ^ 0 = 1
C = 0
1111 + 0001 = 0000
unsigned: 15 + 1 wraps to 0, so C = 1
signed: -1 + 1 = 0, so V = 0
carry into sign bit c3 = 1
carry out  of sign bit c4 = 1

These two examples separate the two flags. Carry-out is not a signed-overflow flag.

MIPS R-Type Execution Connection

A MIPS R-type instruction carries two source register numbers, one destination register number, a shift amount, and a function field. For integer ALU operations, the datapath reads two registers, sends their values into the ALU, and writes the ALU result back to the register file.

# $t0 = $t1 + $t2
add $t0, $t1, $t2

# $t0 = $t1 & $t2
and $t0, $t1, $t2

# $t0 = $t1 - $t2
sub $t0, $t1, $t2

For a textbook single-cycle datapath, the main control recognizes an R-type opcode and routes the funct field into ALU control. The ALU control then emits operation-select bits and a subtract control. A compact table is:

funct   operation   sub  op1 op0
0x20    add          0    1   1
0x22    sub          1    1   1
0x24    and          0    0   0
0x25    or           0    0   1
0x26    xor          0    1   0

The same ALU also calculates addresses for load and store instructions. For lw $t0, 12($s1), the ALU adds the base register $s1 and the sign-extended immediate 12. A branch-equal datapath often subtracts the two registers and tests the zero flag.

Adder Performance Families

Ripple-carry is the direct cascade. It has small area and simple wiring, but the worst-case carry must pass through all nn slices. With one gate delay for propagate/generate formation and about two gate delays per carry step, a 32-bit ripple adder is roughly linear in 32.

Define per-bit propagate and generate:

pi=aibip_i = a_i \oplus b_i gi=aibig_i = a_i b_i ci+1=gi(pici)c_{i+1} = g_i \lor (p_i c_i)

Carry-lookahead expands these equations so carries can be computed in groups. For four bits:

c1=g0p0c0c_1 = g_0 \lor p_0 c_0 c2=g1p1g0p1p0c0c_2 = g_1 \lor p_1 g_0 \lor p_1p_0c_0 c3=g2p2g1p2p1g0p2p1p0c0c_3 = g_2 \lor p_2g_1 \lor p_2p_1g_0 \lor p_2p_1p_0c_0 c4=g3p3g2p3p2g1p3p2p1g0p3p2p1p0c0c_4 = g_3 \lor p_3g_2 \lor p_3p_2g_1 \lor p_3p_2p_1g_0 \lor p_3p_2p_1p_0c_0

Carry-select computes two versions of each block, one assuming carry-in 00 and one assuming carry-in 11, then muxes the right answer when the real carry arrives. It trades extra adders for shorter delay.

Kogge-Stone is a parallel-prefix adder. It combines adjacent generate/propagate pairs into larger ranges with an associative operator:

(G,P)(g,p)=(GPg,Pp)(G,P) \circ (g,p) = (G \lor Pg, Pp)

For 32 bits, an ideal prefix tree needs about log232=5\lceil \log_2 32 \rceil = 5 prefix levels after local generate/propagate formation. The price is heavy wiring, which matters in physical layout.

Key Result

Proposition

Two's-Complement Subtraction Reuses the Adder

Statement

For every nn-bit pattern AA and BB, feeding B=BB' = \overline{B} and c0=1c_0 = 1 into an nn-bit adder produces the low nn bits of ABA - B. The unsigned carry-out reports no-borrow under the common carry convention, while signed overflow is detected by cn1cnc_{n-1} \oplus c_n.

Intuition

In nn bits, B\overline{B} represents 2n1B2^n - 1 - B. Adding one gives 2nB2^n - B. Keeping only the low nn bits removes the added 2n2^n, leaving AB(mod2n)A - B \pmod {2^n}.

Proof Sketch

Treat AA and BB as integers in [0,2n1][0,2^n-1]. The adder computes A+(2n1B)+1=AB+2nA + (2^n - 1 - B) + 1 = A - B + 2^n. Modulo 2n2^n, this equals ABA - B. For signed overflow, the only invalid cases are adding two effective operands with the same sign and obtaining the opposite sign. In a full-adder chain, that condition is equivalent to the carry into the sign bit differing from the carry out of the sign bit, so V=cn1cnV = c_{n-1} \oplus c_n.

Why It Matters

A single adder path implements add, sub, address addition, comparisons through subtraction, and branch equality checks. Hardware is saved because subtraction does not require a second arithmetic circuit.

Failure Mode

Do not use carry-out as signed overflow. In 4 bits, 0111+0001=10000111 + 0001 = 1000 has C=0C = 0 and V=1V = 1. Also, 1111+0001=00001111 + 0001 = 0000 has C=1C = 1 and V=0V = 0.

Common Confusions

Watch Out

Carry-out is not signed overflow

Carry-out belongs to unsigned arithmetic. Signed overflow depends on whether the mathematical signed result fits in the interval [2n1,2n11][-2^{n-1}, 2^{n-1}-1]. The two flags agree in some cases by accident, not by definition.

Watch Out

Zero flag is not produced by the carry chain

The zero flag is a reduction over the result bits. A 32-bit ALU computes it with a tree of OR gates followed by inversion, or with a NOR tree. A carry-out of zero does not mean the result is zero.

Watch Out

Subtraction carry convention differs across ISAs

The internal adder emits a carry-out. Whether the programmer-visible flag means carry, not-borrow, or borrow is an ISA convention. The gate-level subtract path stays A+B+1A + \overline{B} + 1.

Exercises

ExerciseCore

Problem

Build the flags for the 4-bit addition 10112+011021011_2 + 0110_2. Report the result, ZZ, NN, CC, and VV.

ExerciseCore

Problem

Use two's-complement subtraction to compute 01002011120100_2 - 0111_2 in 4 bits. Report the result and the unsigned borrow status under the convention C=1C = 1 means no borrow.

ExerciseAdvanced

Problem

For a 16-bit ALU, compare worst-case carry depth for ripple-carry and an ideal Kogge-Stone prefix adder. Count only carry-combine levels after local pi,gip_i,g_i formation.

References

Canonical:

  • Charles Petzold, Code: The Hidden Language of Computer Hardware and Software, 2nd ed. (2022), ch. 10-14, builds adders, binary arithmetic, and machine structure from relays and gates
  • David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 5th ed. (2014), Appendix B, covers Boolean logic, ALU design, carry-lookahead, and datapath control
  • John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed. (2019), Appendix A and §3.3, covers instruction execution context and arithmetic datapath costs
  • J. Clark Scott, But How Do It Know? The Basic Principles of Computers for Everyone (2009), ch. 5-12, gives a gate-to-CPU construction with ALU and control ideas
  • Neil H. E. Weste and David Money Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. (2011), ch. 11, covers adder circuit families and prefix adders

Accessible:

  • Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 2, connects integer encodings, overflow, and C-level behavior
  • MIT OpenCourseWare, 6.004 Computation Structures, arithmetic and ALU lecture notes
  • University of California Berkeley CS61C notes, single-cycle datapath and ALU control lectures

Next Topics

  • /computationpath/binary-and-bits
  • /computationpath/carry-lookahead-adders
  • /computationpath/single-cycle-datapath
  • /computationpath/simd-vector-alu
  • /topics/boolean-algebra