Why This Matters
A 32-bit MIPS add $t0,$t1,$t2 instruction is the bit string 00000001001010100100000000100000, stored as hex 0x012A4020. The CPU does not execute the word add. It fetches those 32 bits, slices them into fields, decodes the opcode and function bits, reads two registers, writes one register, and advances the program counter.
That fact is still visible in ML systems work. A slow reduction kernel, a miscompiled vector loop, an interrupt handler, or a bootloader failure often reduces to a short sequence of instructions. You do not need to memorize a full instruction set. You do need to know what an instruction encoding is, what assembly hides, and what an assembler actually computes.
Core Definitions
Machine code
Machine code is the byte representation accepted by a processor's instruction decoder. For a fixed-width ISA such as classic MIPS, each instruction is 32 bits. For x86-64, instructions vary in length from 1 byte to 15 bytes, with prefixes, opcode bytes, ModR/M fields, immediate bytes, and displacement bytes.
Assembly language
Assembly language is a textual notation for machine instructions and assembler directives. A real instruction such as lw $t0,16($sp) maps to one machine instruction. A pseudo-instruction such as move $t0,$t1 is assembler syntax that expands into a real instruction such as addu $t0,$t1,$zero.
Instruction format
An instruction format partitions a fixed number of bits into fields. Typical fields are opcode, source register, destination register, immediate value, shift amount, function code, and branch or jump offset.
Addressing mode
An addressing mode is the rule used to compute an operand value or memory address. Common cases are immediate, register, base plus offset, and PC-relative addressing.
Machine Code As Fields, Not Text
MIPS is a clean first ISA because every base instruction is 32 bits. The decoder can fetch one word and split it by position. Three formats cover the main cases.
R-format
31 26 25 21 20 16 15 11 10 6 5 0
+---------+-------+-------+-------+--------+--------+
| opcode | rs | rt | rd | shamt | funct |
+---------+-------+-------+-------+--------+--------+
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
I-format
31 26 25 21 20 16 15 0
+---------+-------+-------+-------------------------+
| opcode | rs | rt | immediate |
+---------+-------+-------+-------------------------+
6 bits 5 bits 5 bits 16 bits
J-format
31 26 25 0
+---------+-----------------------------------------+
| opcode | target index |
+---------+-----------------------------------------+
6 bits 26 bits
For add $t0,$t1,$t2, MIPS uses R-format. Register numbers are $t0=8, $t1=9, $t2=10. The arithmetic opcode is 0, and the funct field selects addition with value 32.
add $t0,$t1,$t2
opcode rs rt rd shamt funct
000000 01001 01010 01000 00000 100000
binary word:
00000001001010100100000000100000
hex word:
0x012A4020
bytes, big-endian:
01 2A 40 20
bytes, little-endian:
20 40 2A 01
Endianness changes the byte order in memory, not the logical field positions after the instruction word has been fetched and presented to the decoder.
For lw $t0,16($sp), MIPS uses I-format. The load-word opcode is 35, $sp=29, $t0=8, and the immediate is 16.
lw $t0,16($sp)
opcode rs rt immediate
100011 11101 01000 0000000000010000
binary word:
10001111101010000000000000010000
hex word:
0x8FA80010
bytes, big-endian:
8F A8 00 10
bytes, little-endian:
10 00 A8 8F
The semantic operation is R[8] = Memory[R[29] + sign_extend(16)]. The 16-bit immediate is signed for this addressing mode, so lw $t0,-4($sp) encodes an immediate of 0xFFFC.
Addressing Modes In Small Examples
Immediate addressing places a constant inside the instruction.
addi $t0,$zero,37 # R[8] = 0 + 37
Register addressing takes operands from registers.
addu $t0,$t1,$t2 # R[8] = R[9] + R[10], ignoring signed overflow traps
Base plus offset addressing computes a memory address from a base register and a signed immediate.
lw $t0,12($sp) # load 4 bytes from R[29] + 12
sw $t0,0($sp) # store 4 bytes to R[29] + 0
PC-relative addressing computes a target using the program counter. In MIPS branches, the offset is measured in instructions from PC + 4.
Suppose this branch is stored at address 0x00400020, and label done is at 0x00400034.
0x00400020: beq $t0,$zero,done
0x00400024: addi $s0,$s0,1
0x00400028: addi $s1,$s1,1
0x0040002C: addi $s2,$s2,1
0x00400030: addi $s3,$s3,1
0x00400034: done:
The branch immediate is:
The encoded instruction is:
beq $t0,$zero,done
opcode rs rt immediate
000100 01000 00000 0000000000000100
hex word:
0x11000004
The assembler computes this offset from labels. The CPU sees only 0x11000004.
Labels, Pseudo-Instructions, And The Assembler
An assembler reads assembly text, assigns addresses to instructions and data, expands pseudo-instructions, resolves labels within the assembled file, and emits bytes plus relocation records when external names remain. Linking and loading are separate stages and are not covered here.
Labels are names for addresses.
loop:
lw $t0,0($s0)
addu $s1,$s1,$t0
addi $s0,$s0,4
bne $s0,$s2,loop
If loop is at 0x00400000, and the bne is at 0x0040000C, the branch target offset is:
The immediate field is the 16-bit two's-complement value 0xFFFC.
Pseudo-instructions are conveniences. They do not have to preserve one input line per output instruction.
move $t0,$t1
A typical MIPS assembler emits:
addu $t0,$t1,$zero
Loading a 32-bit constant often expands into two real instructions.
li $t0,0x12345678
One possible expansion is:
lui $t0,0x1234 # upper 16 bits
ori $t0,$t0,0x5678 # lower 16 bits
This matters when counting instructions, inspecting branch distances, or matching a profiler sample to an assembly listing.
From One C Statement To Instructions
Consider a C statement over 32-bit signed integers.
a = b + c * 4;
Assume a, b, and c live in a stack frame at offsets 0($sp), 4($sp), and 8($sp). A simple MIPS lowering is:
lw $t0,4($sp) # t0 = b
lw $t1,8($sp) # t1 = c
sll $t1,$t1,2 # t1 = c << 2
addu $t0,$t0,$t1 # t0 = b + 4*c
sw $t0,0($sp) # a = t0
With little-endian 32-bit words and initial values b = 7, c = -3, the stack bytes might be:
offset bytes value
0 00 00 00 00 a, old value
4 07 00 00 00 b = 7
8 FD FF FF FF c = -3, two's complement
Step by step:
after lw $t0,4($sp): $t0 = 0x00000007
after lw $t1,8($sp): $t1 = 0xFFFFFFFD
after sll $t1,$t1,2: $t1 = 0xFFFFFFF4
after addu: $t0 = 0xFFFFFFFB
after sw: bytes at offset 0 are FB FF FF FF
The stored result is -5. A compiler may choose a different sequence if the variables already live in registers:
sll $t1,$s2,2 # c * 4
addu $s0,$s1,$t1 # a = b + c * 4
On x86-64, the same expression might use a scaled addressing form. If b is in esi, c is in edx, and a is in eax, one legal sequence is:
lea eax,[rsi + rdx*4] # eax = b + 4*c, no memory load
The syntax differs, but the same idea remains: the machine instruction encodes registers, an operation, and sometimes a scale, displacement, or immediate constant.
Why Assembly Still Matters
Assembly is the inspection layer for generated code. If a hot loop fails to vectorize, the generated instructions show whether the compiler emitted scalar loads and adds or SIMD operations.
#include <immintrin.h>
void saxpy8(float *y, const float *x, float a) {
__m256 av = _mm256_set1_ps(a);
__m256 xv = _mm256_loadu_ps(x);
__m256 yv = _mm256_loadu_ps(y);
yv = _mm256_fmadd_ps(av, xv, yv);
_mm256_storeu_ps(y, yv);
}
A typical AVX2/FMA lowering includes an instruction such as:
vbroadcastss ymm0, DWORD PTR [a]
vmovups ymm1, YMMWORD PTR [x]
vmovups ymm2, YMMWORD PTR [y]
vfmadd213ps ymm1, ymm0, ymm2
vmovups YMMWORD PTR [y], ymm1
The intrinsic name _mm256_fmadd_ps is not machine code, but it constrains the compiler toward a fused multiply-add instruction. This level matters when comparing one load per element with one vector load per eight float values.
Assembly also appears where C has no portable operation. A kernel context switch must save registers and install a new stack pointer. A bootloader starts with a small set of architectural guarantees and cannot assume a language runtime.
# sketch only: save callee-saved registers, switch stack, restore another task
sw $s0,0($a0)
sw $s1,4($a0)
sw $sp,8($a0)
lw $s0,0($a1)
lw $s1,4($a1)
lw $sp,8($a1)
jr $ra
The exact register set and calling convention are ISA and operating-system specific. The point is concrete: context switching is register and memory traffic.
Key Result
The fetch-decode-execute invariant is the useful model.
For fixed-width MIPS instructions:
Then the decoded instruction may read registers, write a register, access memory, and replace nextPC with a branch or jump target.
For the branch example above:
For base plus offset loads:
The assembler's invariant is the inverse mapping for real instructions: choose field values, pack them into the ISA format, and emit the corresponding bytes in the object file's chosen byte order. Pseudo-instructions break the one-line assumption but not the machine-code invariant.
Common Confusions
Assembly is not the same as machine code
add $t0,$t1,$t2 is text. 0x012A4020 is machine code for one MIPS encoding of that operation. The assembler maps text to bytes. The CPU fetches bytes, not mnemonics.
Labels are not stored as strings in branch instructions
A branch instruction stores an offset or target field, not the characters loop. The assembler computes the numeric field. If loop moves, the offset changes even though the assembly source still says loop.
Pseudo-instruction counts are not instruction counts
li $t0,0x12345678 may become two instructions. In a tight loop, a pseudo-instruction can change code size, branch reach, and cycle counts.
Endianness does not reorder instruction fields
Little-endian memory stores the least significant byte first. It does not mean the opcode field moves from bits 31:26 to bits 5:0 in the logical instruction word.
Exercises
Problem
Encode add $s0,$s1,$s2 as a 32-bit MIPS instruction. Use $s0=16, $s1=17, $s2=18, opcode 0, shamt 0, and funct 32. Give the binary fields, hex word, and little-endian bytes.
Problem
A beq $t0,$zero,target instruction is at address 0x00400040. The label target is at 0x00400030. Compute the 16-bit branch immediate.
Problem
For the C statement a = b + c * 4, assume b is already in $s1, c is in $s2, and a must be stored to 0($sp). Write a MIPS sequence with no loads. Then evaluate it for b=100 and c=6.
References
Canonical:
- Hennessy and Patterson, Computer Architecture: A Quantitative Approach (6th ed., 2017), ch. 1-3 — quantitative CPU design context, ISA cost model, and processor-memory interaction
- Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective (3rd ed., 2016), ch. 3 — x86-64 machine-level representation and compiler output inspection
- Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective (3rd ed., 2016), ch. 4-6 — processor architecture, performance, and memory hierarchy context for instruction execution
- Patterson and Hennessy, Computer Organization and Design: The Hardware/Software Interface (5th ed., 2014), ch. 2 ; MIPS instructions, formats, addressing, and assembly examples
- Intel, Intel 64 and IA-32 Architectures Software Developer's Manual, Vol. 2A, ch. 2 ; x86 instruction format, opcode bytes, ModR/M, SIB, displacement, and immediate fields
Accessible:
- Cornell CS3410, MIPS Instruction Reference ; compact reference for MIPS registers, encodings, and examples
- UC Berkeley CS61C, Great Ideas in Computer Architecture lecture notes ; machine structures, C-to-assembly examples, and RISC instruction formats
- MIPS Technologies, MIPS32 Architecture for Programmers Volume II ; open architecture manual for instruction encodings and semantics
Next Topics
- /computationpath/instruction-pipelining
- /computationpath/calling-conventions-and-stack-frames
- /computationpath/linking-and-loading
- /computationpath/simd-and-vectorization