Machine Language and Assembly

Why This Matters

A 32-bit MIPS add $t0,$t1,$t2 instruction is the bit string 00000001001010100100000000100000, stored as hex 0x012A4020. The CPU does not execute the word add. It fetches those 32 bits, slices them into fields, decodes the opcode and function bits, reads two registers, writes one register, and advances the program counter.

That fact is still visible in ML systems work. A slow reduction kernel, a miscompiled vector loop, an interrupt handler, or a bootloader failure often reduces to a short sequence of instructions. You do not need to memorize a full instruction set. You do need to know what an instruction encoding is, what assembly hides, and what an assembler actually computes.

Core Definitions

Definition

Machine code

Machine code is the byte representation accepted by a processor's instruction decoder. For a fixed-width ISA such as classic MIPS, each instruction is 32 bits. For x86-64, instructions vary in length from 1 byte to 15 bytes, with prefixes, opcode bytes, ModR/M fields, immediate bytes, and displacement bytes.

Definition

Assembly language

Assembly language is a textual notation for machine instructions and assembler directives. A real instruction such as lw $t0,16($sp) maps to one machine instruction. A pseudo-instruction such as move $t0,$t1 is assembler syntax that expands into a real instruction such as addu $t0,$t1,$zero.

Definition

Instruction format

An instruction format partitions a fixed number of bits into fields. Typical fields are opcode, source register, destination register, immediate value, shift amount, function code, and branch or jump offset.

Definition

Addressing mode

An addressing mode is the rule used to compute an operand value or memory address. Common cases are immediate, register, base plus offset, and PC-relative addressing.

Machine Code As Fields, Not Text

MIPS is a clean first ISA because every base instruction is 32 bits. The decoder can fetch one word and split it by position. Three formats cover the main cases.

R-format
31      26 25   21 20   16 15   11 10    6 5      0
+---------+-------+-------+-------+--------+--------+
| opcode  |  rs   |  rt   |  rd   | shamt  | funct  |
+---------+-------+-------+-------+--------+--------+
   6 bits  5 bits  5 bits  5 bits   5 bits   6 bits

I-format
31      26 25   21 20   16 15                      0
+---------+-------+-------+-------------------------+
| opcode  |  rs   |  rt   |       immediate         |
+---------+-------+-------+-------------------------+
   6 bits  5 bits  5 bits          16 bits

J-format
31      26 25                                      0
+---------+-----------------------------------------+
| opcode  |              target index               |
+---------+-----------------------------------------+
   6 bits                 26 bits

For add $t0,$t1,$t2, MIPS uses R-format. Register numbers are $t0=8, $t1=9, $t2=10. The arithmetic opcode is 0, and the funct field selects addition with value 32.

add $t0,$t1,$t2

opcode rs    rt    rd    shamt funct
000000 01001 01010 01000 00000 100000

binary word:
00000001001010100100000000100000

hex word:
0x012A4020

bytes, big-endian:
01 2A 40 20

bytes, little-endian:
20 40 2A 01

Endianness changes the byte order in memory, not the logical field positions after the instruction word has been fetched and presented to the decoder.

For lw $t0,16($sp), MIPS uses I-format. The load-word opcode is 35, $sp=29, $t0=8, and the immediate is 16.

lw $t0,16($sp)

opcode rs    rt    immediate
100011 11101 01000 0000000000010000

binary word:
10001111101010000000000000010000

hex word:
0x8FA80010

bytes, big-endian:
8F A8 00 10

bytes, little-endian:
10 00 A8 8F

The semantic operation is R[8] = Memory[R[29] + sign_extend(16)]. The 16-bit immediate is signed for this addressing mode, so lw $t0,-4($sp) encodes an immediate of 0xFFFC.

Addressing Modes In Small Examples

Immediate addressing places a constant inside the instruction.

addi $t0,$zero,37     # R[8] = 0 + 37

addu $t0,$t1,$t2      # R[8] = R[9] + R[10], ignoring signed overflow traps

Base plus offset addressing computes a memory address from a base register and a signed immediate.

lw   $t0,12($sp)      # load 4 bytes from R[29] + 12
sw   $t0,0($sp)       # store 4 bytes to R[29] + 0

PC-relative addressing computes a target using the program counter. In MIPS branches, the offset is measured in instructions from PC + 4.

Suppose this branch is stored at address 0x00400020, and label done is at 0x00400034.

0x00400020: beq $t0,$zero,done
0x00400024: addi $s0,$s0,1
0x00400028: addi $s1,$s1,1
0x0040002C: addi $s2,$s2,1
0x00400030: addi $s3,$s3,1
0x00400034: done:

The branch immediate is:

\frac{0x00400034 - 0x00400024}{4} = \frac{0x10}{4} = 4

The encoded instruction is:

beq $t0,$zero,done

opcode rs    rt    immediate
000100 01000 00000 0000000000000100

hex word:
0x11000004

The assembler computes this offset from labels. The CPU sees only 0x11000004.

Labels, Pseudo-Instructions, And The Assembler

An assembler reads assembly text, assigns addresses to instructions and data, expands pseudo-instructions, resolves labels within the assembled file, and emits bytes plus relocation records when external names remain. Linking and loading are separate stages and are not covered here.

Labels are names for addresses.

loop:
    lw   $t0,0($s0)
    addu $s1,$s1,$t0
    addi $s0,$s0,4
    bne  $s0,$s2,loop

If loop is at 0x00400000, and the bne is at 0x0040000C, the branch target offset is:

\frac{0x00400000 - 0x00400010}{4} = -4

The immediate field is the 16-bit two's-complement value 0xFFFC.

Pseudo-instructions are conveniences. They do not have to preserve one input line per output instruction.

move $t0,$t1

A typical MIPS assembler emits:

addu $t0,$t1,$zero

Loading a 32-bit constant often expands into two real instructions.

li $t0,0x12345678

One possible expansion is:

lui $t0,0x1234       # upper 16 bits
ori $t0,$t0,0x5678   # lower 16 bits

This matters when counting instructions, inspecting branch distances, or matching a profiler sample to an assembly listing.

From One C Statement To Instructions

Consider a C statement over 32-bit signed integers.

a = b + c * 4;

Assume a, b, and c live in a stack frame at offsets 0($sp), 4($sp), and 8($sp). A simple MIPS lowering is:

lw   $t0,4($sp)      # t0 = b
lw   $t1,8($sp)      # t1 = c
sll  $t1,$t1,2       # t1 = c << 2
addu $t0,$t0,$t1     # t0 = b + 4*c
sw   $t0,0($sp)      # a = t0

With little-endian 32-bit words and initial values b = 7, c = -3, the stack bytes might be:

offset  bytes          value
0       00 00 00 00    a, old value
4       07 00 00 00    b = 7
8       FD FF FF FF    c = -3, two's complement

Step by step:

after lw $t0,4($sp):   $t0 = 0x00000007
after lw $t1,8($sp):   $t1 = 0xFFFFFFFD
after sll $t1,$t1,2:   $t1 = 0xFFFFFFF4
after addu:            $t0 = 0xFFFFFFFB
after sw:              bytes at offset 0 are FB FF FF FF

The stored result is -5. A compiler may choose a different sequence if the variables already live in registers:

sll  $t1,$s2,2       # c * 4
addu $s0,$s1,$t1     # a = b + c * 4

On x86-64, the same expression might use a scaled addressing form. If b is in esi, c is in edx, and a is in eax, one legal sequence is:

lea eax,[rsi + rdx*4]  # eax = b + 4*c, no memory load

The syntax differs, but the same idea remains: the machine instruction encodes registers, an operation, and sometimes a scale, displacement, or immediate constant.

Why Assembly Still Matters

Assembly is the inspection layer for generated code. If a hot loop fails to vectorize, the generated instructions show whether the compiler emitted scalar loads and adds or SIMD operations.

#include <immintrin.h>

void saxpy8(float *y, const float *x, float a) {
    __m256 av = _mm256_set1_ps(a);
    __m256 xv = _mm256_loadu_ps(x);
    __m256 yv = _mm256_loadu_ps(y);
    yv = _mm256_fmadd_ps(av, xv, yv);
    _mm256_storeu_ps(y, yv);
}

A typical AVX2/FMA lowering includes an instruction such as:

vbroadcastss ymm0, DWORD PTR [a]
vmovups      ymm1, YMMWORD PTR [x]
vmovups      ymm2, YMMWORD PTR [y]
vfmadd213ps  ymm1, ymm0, ymm2
vmovups      YMMWORD PTR [y], ymm1

The intrinsic name _mm256_fmadd_ps is not machine code, but it constrains the compiler toward a fused multiply-add instruction. This level matters when comparing one load per element with one vector load per eight float values.

Assembly also appears where C has no portable operation. A kernel context switch must save registers and install a new stack pointer. A bootloader starts with a small set of architectural guarantees and cannot assume a language runtime.

# sketch only: save callee-saved registers, switch stack, restore another task
sw   $s0,0($a0)
sw   $s1,4($a0)
sw   $sp,8($a0)

lw   $s0,0($a1)
lw   $s1,4($a1)
lw   $sp,8($a1)
jr   $ra

The exact register set and calling convention are ISA and operating-system specific. The point is concrete: context switching is register and memory traffic.

Key Result

The fetch-decode-execute invariant is the useful model.

For fixed-width MIPS instructions:

instruction = Memory[PC]

opcode = instruction[31:26]

nextPC = PC + 4

Then the decoded instruction may read registers, write a register, access memory, and replace nextPC with a branch or jump target.

For the branch example above:

target = (PC + 4) + (sign\_extend(imm16) \times 4)

For base plus offset loads:

address = R[rs] + sign\_extend(imm16)

The assembler's invariant is the inverse mapping for real instructions: choose field values, pack them into the ISA format, and emit the corresponding bytes in the object file's chosen byte order. Pseudo-instructions break the one-line assumption but not the machine-code invariant.

Common Confusions

Watch Out

Assembly is not the same as machine code

add $t0,$t1,$t2 is text. 0x012A4020 is machine code for one MIPS encoding of that operation. The assembler maps text to bytes. The CPU fetches bytes, not mnemonics.

Watch Out

Labels are not stored as strings in branch instructions

A branch instruction stores an offset or target field, not the characters loop. The assembler computes the numeric field. If loop moves, the offset changes even though the assembly source still says loop.

Watch Out

Pseudo-instruction counts are not instruction counts

li $t0,0x12345678 may become two instructions. In a tight loop, a pseudo-instruction can change code size, branch reach, and cycle counts.

Watch Out

Endianness does not reorder instruction fields

Little-endian memory stores the least significant byte first. It does not mean the opcode field moves from bits 31:26 to bits 5:0 in the logical instruction word.

Exercises

ExerciseCore

Problem

Encode add $s0,$s1,$s2 as a 32-bit MIPS instruction. Use $s0=16, $s1=17, $s2=18, opcode 0, shamt 0, and funct 32. Give the binary fields, hex word, and little-endian bytes.

ExerciseCore

Problem

A beq $t0,$zero,target instruction is at address 0x00400040. The label target is at 0x00400030. Compute the 16-bit branch immediate.

ExerciseAdvanced

Problem

For the C statement a = b + c * 4, assume b is already in $s1, c is in $s2, and a must be stored to 0($sp). Write a MIPS sequence with no loads. Then evaluate it for b=100 and c=6.

References

Canonical:

Hennessy and Patterson, Computer Architecture: A Quantitative Approach (6th ed., 2017), ch. 1-3 — quantitative CPU design context, ISA cost model, and processor-memory interaction
Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective (3rd ed., 2016), ch. 3 — x86-64 machine-level representation and compiler output inspection
Bryant and O'Hallaron, Computer Systems: A Programmer's Perspective (3rd ed., 2016), ch. 4-6 — processor architecture, performance, and memory hierarchy context for instruction execution
Patterson and Hennessy, Computer Organization and Design: The Hardware/Software Interface (5th ed., 2014), ch. 2 ; MIPS instructions, formats, addressing, and assembly examples
Intel, Intel 64 and IA-32 Architectures Software Developer's Manual, Vol. 2A, ch. 2 ; x86 instruction format, opcode bytes, ModR/M, SIB, displacement, and immediate fields

Accessible:

Cornell CS3410, MIPS Instruction Reference ; compact reference for MIPS registers, encodings, and examples
UC Berkeley CS61C, Great Ideas in Computer Architecture lecture notes ; machine structures, C-to-assembly examples, and RISC instruction formats
MIPS Technologies, MIPS32 Architecture for Programmers Volume II ; open architecture manual for instruction encodings and semantics

Next Topics

/computationpath/instruction-pipelining
/computationpath/calling-conventions-and-stack-frames
/computationpath/linking-and-loading
/computationpath/simd-and-vectorization