Why This Matters
A 32-bit integer add in a simple processor can be built from 32 nearly identical 1-bit circuits. Each slice computes logic functions, a sum bit, and a carry-out bit. A small mux then chooses which value becomes the output bit.
That circuit is on the critical path for common instructions such as add, sub, and, or, slt, address calculation for lw, and branch comparison. Replacing a ripple-carry adder with a carry-lookahead or parallel-prefix adder changes the gate-depth of addition from proportional to to proportional to , at the cost of more wires and gates.
Core Definitions
1-bit ALU slice
A 1-bit ALU slice is a combinational circuit that accepts operand bits and , an incoming carry , operation-select lines, and optional inversion controls. It produces one result bit and a carry-out .
Full adder
A full adder maps three input bits , , and to a sum bit and carry-out:
Two's-complement subtraction
For fixed width , subtraction is implemented as . The ALU reuses the same adder by XORing each with a subtract control bit and setting the initial carry-in to .
Status flags
Status flags are 1-bit summaries of the ALU result. Common flags are zero , sign , carry , and overflow . They are not all about the same number system: is for unsigned arithmetic, while is for signed two's-complement arithmetic.
The 1-Bit Slice
A minimal ALU slice computes several candidate outputs in parallel, then uses a mux to pick one. For operation select bits op1 op0, one common mapping is 00 = AND, 01 = OR, 10 = XOR, 11 = SUM.
inputs: a_i, b_i, c_i, op1, op0
logic: and_i = a_i & b_i
or_i = a_i | b_i
xor_i = a_i ^ b_i
sum_i = a_i ^ b_i ^ c_i
mux: r_i = MUX4(op1 op0, and_i, or_i, xor_i, sum_i)
carry: c_{i+1} = majority(a_i, b_i, c_i)
For a concrete bit, take , , and .
AND = 0
OR = 1
XOR = 1
SUM = 0, since 1 ^ 0 ^ 1 = 0
Cout = 1, since two of the three adder inputs are 1
The mux is itself gates. A 4-to-1 mux for one result bit can be written as:
where are the candidate outputs. This form is wasteful in transistor count compared with tuned cell-library muxes, but it is the right Boolean model.
A small C model makes the slice behavior testable:
#include <stdint.h>
struct SliceOut {
uint8_t r;
uint8_t cout;
};
struct SliceOut alu1(uint8_t a, uint8_t b, uint8_t cin, uint8_t op) {
a &= 1; b &= 1; cin &= 1; op &= 3;
uint8_t x0 = a & b;
uint8_t x1 = a | b;
uint8_t x2 = a ^ b;
uint8_t x3 = a ^ b ^ cin;
uint8_t cout = (a & b) | (a & cin) | (b & cin);
uint8_t table[4] = {x0, x1, x2, x3};
struct SliceOut out = {table[op], cout};
return out;
}
The adder carry is usually computed even for logical operations. A real design can gate unused paths, but the conceptual ALU slice is simpler when all candidates are present all the time.
Cascading Slices into an n-Bit ALU
An -bit ALU places slices side by side. Slice 0 receives the external carry-in . Slice receives from slice . The result bits form .
a0,b0,c0 -> slice 0 -> r0,c1
a1,b1,c1 -> slice 1 -> r1,c2
a2,b2,c2 -> slice 2 -> r2,c3
...
a31,b31,c31 -> slice 31 -> r31,c32
For 4-bit addition, compute .
i a_i b_i c_i sum c_{i+1}
0 1 1 0 0 1
1 1 0 1 0 1
2 1 0 1 0 1
3 0 0 1 1 0
The result is . As an unsigned 4-bit number this is , with carry-out . As a signed 4-bit two's-complement number, is and is , but is . That is signed overflow, even though there is no unsigned carry-out.
At the byte level, an 8-bit ALU sees only bit patterns:
0x7f = 0111 1111
+ 0x01 = 0000 0001
-------------------
0x80 = 1000 0000
Unsigned interpretation gives . Signed two's-complement interpretation gives outside the range , so the stored bit pattern wraps to and sets overflow.
Subtraction by Inverting B
A separate subtractor is unnecessary. Let sub = 1 for subtraction and sub = 0 for addition. Each slice receives , and the low carry-in is .
sub = 0: b'_i = b_i, c0 = 0, so A + B
sub = 1: b'_i = not b_i, c0 = 1, so A + not(B) + 1
Example: compute in 4 bits.
A = 0101
B = 0011
not B = 1100
+ 1 = 1101
A + not B + 1:
0101
+ 1101
------
0010 with carry-out 1
The carry-out is because no unsigned borrow occurred. Many instruction sets expose this differently. Some architectures define a carry flag after subtraction as "not borrow"; others define a borrow flag. Always read the ISA manual before using condition codes.
Example: compute in 4 bits.
A = 0011
B = 0101
not B = 1010
+ 1 = 1011
0011
+ 1011
------
1110 with carry-out 0
The bit pattern is unsigned but signed . The missing carry-out means an unsigned borrow happened.
Zero, Sign, Carry, and Overflow Flags
The zero flag is the NOR of every result bit:
The sign flag is just the top result bit:
The carry flag for addition is the carry-out of the top slice:
For subtraction implemented as , the same means "no borrow" under the common convention used by several processors. Some ISAs invert this meaning in their exposed flags.
Signed overflow is different. For addition, overflow occurs when both inputs have the same sign and the result has the opposite sign:
For subtraction , overflow occurs when and have different signs and the result sign differs from :
The carry-chain view gives a compact test for signed overflow in addition and subtraction after inversion has already been applied:
Here is the carry into the sign bit, and is the carry out of the sign bit.
Worked 4-bit cases:
0111 + 0001 = 1000
carry into sign bit c3 = 1
carry out of sign bit c4 = 0
V = 1 ^ 0 = 1
C = 0
1111 + 0001 = 0000
unsigned: 15 + 1 wraps to 0, so C = 1
signed: -1 + 1 = 0, so V = 0
carry into sign bit c3 = 1
carry out of sign bit c4 = 1
These two examples separate the two flags. Carry-out is not a signed-overflow flag.
MIPS R-Type Execution Connection
A MIPS R-type instruction carries two source register numbers, one destination register number, a shift amount, and a function field. For integer ALU operations, the datapath reads two registers, sends their values into the ALU, and writes the ALU result back to the register file.
# $t0 = $t1 + $t2
add $t0, $t1, $t2
# $t0 = $t1 & $t2
and $t0, $t1, $t2
# $t0 = $t1 - $t2
sub $t0, $t1, $t2
For a textbook single-cycle datapath, the main control recognizes an R-type opcode and routes the funct field into ALU control. The ALU control then emits operation-select bits and a subtract control. A compact table is:
funct operation sub op1 op0
0x20 add 0 1 1
0x22 sub 1 1 1
0x24 and 0 0 0
0x25 or 0 0 1
0x26 xor 0 1 0
The same ALU also calculates addresses for load and store instructions. For lw $t0, 12($s1), the ALU adds the base register $s1 and the sign-extended immediate 12. A branch-equal datapath often subtracts the two registers and tests the zero flag.
Adder Performance Families
Ripple-carry is the direct cascade. It has small area and simple wiring, but the worst-case carry must pass through all slices. With one gate delay for propagate/generate formation and about two gate delays per carry step, a 32-bit ripple adder is roughly linear in 32.
Define per-bit propagate and generate:
Carry-lookahead expands these equations so carries can be computed in groups. For four bits:
Carry-select computes two versions of each block, one assuming carry-in and one assuming carry-in , then muxes the right answer when the real carry arrives. It trades extra adders for shorter delay.
Kogge-Stone is a parallel-prefix adder. It combines adjacent generate/propagate pairs into larger ranges with an associative operator:
For 32 bits, an ideal prefix tree needs about prefix levels after local generate/propagate formation. The price is heavy wiring, which matters in physical layout.
Key Result
Two's-Complement Subtraction Reuses the Adder
Statement
For every -bit pattern and , feeding and into an -bit adder produces the low bits of . The unsigned carry-out reports no-borrow under the common carry convention, while signed overflow is detected by .
Intuition
In bits, represents . Adding one gives . Keeping only the low bits removes the added , leaving .
Proof Sketch
Treat and as integers in . The adder computes . Modulo , this equals . For signed overflow, the only invalid cases are adding two effective operands with the same sign and obtaining the opposite sign. In a full-adder chain, that condition is equivalent to the carry into the sign bit differing from the carry out of the sign bit, so .
Why It Matters
A single adder path implements add, sub, address addition, comparisons through subtraction, and branch equality checks. Hardware is saved because subtraction does not require a second arithmetic circuit.
Failure Mode
Do not use carry-out as signed overflow. In 4 bits, has and . Also, has and .
Common Confusions
Carry-out is not signed overflow
Carry-out belongs to unsigned arithmetic. Signed overflow depends on whether the mathematical signed result fits in the interval . The two flags agree in some cases by accident, not by definition.
Zero flag is not produced by the carry chain
The zero flag is a reduction over the result bits. A 32-bit ALU computes it with a tree of OR gates followed by inversion, or with a NOR tree. A carry-out of zero does not mean the result is zero.
Subtraction carry convention differs across ISAs
The internal adder emits a carry-out. Whether the programmer-visible flag means carry, not-borrow, or borrow is an ISA convention. The gate-level subtract path stays .
Exercises
Problem
Build the flags for the 4-bit addition . Report the result, , , , and .
Problem
Use two's-complement subtraction to compute in 4 bits. Report the result and the unsigned borrow status under the convention means no borrow.
Problem
For a 16-bit ALU, compare worst-case carry depth for ripple-carry and an ideal Kogge-Stone prefix adder. Count only carry-combine levels after local formation.
References
Canonical:
- Charles Petzold, Code: The Hidden Language of Computer Hardware and Software, 2nd ed. (2022), ch. 10-14, builds adders, binary arithmetic, and machine structure from relays and gates
- David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 5th ed. (2014), Appendix B, covers Boolean logic, ALU design, carry-lookahead, and datapath control
- John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed. (2019), Appendix A and §3.3, covers instruction execution context and arithmetic datapath costs
- J. Clark Scott, But How Do It Know? The Basic Principles of Computers for Everyone (2009), ch. 5-12, gives a gate-to-CPU construction with ALU and control ideas
- Neil H. E. Weste and David Money Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. (2011), ch. 11, covers adder circuit families and prefix adders
Accessible:
- Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 2, connects integer encodings, overflow, and C-level behavior
- MIT OpenCourseWare, 6.004 Computation Structures, arithmetic and ALU lecture notes
- University of California Berkeley CS61C notes, single-cycle datapath and ALU control lectures
Next Topics
/computationpath/binary-and-bits/computationpath/carry-lookahead-adders/computationpath/single-cycle-datapath/computationpath/simd-vector-alu/topics/boolean-algebra