Skip to main content

Bits and Codes · 45 min

Binary and Bits

Why computers represent everything as bits, how integers are encoded in two's complement, and how hexadecimal compresses bytes for human use.

Why This Matters

A single byte has 256 possible patterns. The pattern 11111111 can mean unsigned 255, signed negative 1, a bit mask with eight flags set, or the low byte of a machine instruction. The hardware stores the same eight bits in all cases; the program and instruction choose the interpretation.

Binary wins in digital hardware because two voltage ranges are easier to separate than ten. A circuit can treat 0.0 to 0.8 volts as 0 and 2.0 to 5.0 volts as 1, leaving a noise margin between them. Ten voltage levels would pack decisions closer together, making heat, manufacturing variation, and wire coupling harder to tolerate at high clock rates.

Core Definitions

Definition

Bit

A bit is a variable with two possible values, written 0 and 1. In hardware it is represented by a physical state, such as a low or high voltage on a wire, a charge in a memory cell, or a magnetic orientation on storage media.

Definition

Byte

A byte is 8 bits. It has 28=2562^8 = 256 possible patterns, from 00000000 through 11111111. Modern processors address memory in bytes, so an address usually names one byte location.

Definition

Unsigned n-bit integer

An unsigned nn-bit integer with bits bn1b1b0b_{n-1}\dots b_1b_0 has value i=0n1bi2i\sum_{i=0}^{n-1} b_i2^i. Its range is 00 through 2n12^n - 1.

Definition

Two's complement signed integer

An nn-bit two's complement integer with bits bn1b0b_{n-1}\dots b_0 has value bn12n1+i=0n2bi2i-b_{n-1}2^{n-1} + \sum_{i=0}^{n-2} b_i2^i. Its range is 2n1-2^{n-1} through 2n112^{n-1}-1.

Binary Place Value and Conversion

Binary is ordinary positional notation with base 2. Decimal uses powers of 10. Binary uses powers of 2. The rightmost bit has weight 1, then 2, 4, 8, and so on.

For the byte 10110110, the weights are:

bit10110110
weight1286432168421

The value is 128+32+16+4+2=182128 + 32 + 16 + 4 + 2 = 182.

To convert decimal 182 back to binary, subtract powers of 2 from high to low:

weighttake?remainder
128yes54
64no54
32yes22
16yes6
8no6
4yes2
2yes0
1no0

So 182 is 10110110.

Hexadecimal uses base 16, with digits 0 through 9 and A through F. One hex digit stores exactly 4 bits because 16=2416 = 2^4. That 4-bit group is called a nibble.

binaryhexdecimal
000000
000111
100199
1010A10
1111F15

A byte is two hex digits. Split 10110110 into nibbles:

1011 0110
  B    6

So 10110110 is 0xB6. Hex is not a different storage format. It is a compact way to write bits for humans.

Signed Integers and Why Two's Complement Wins

A byte can store 256 patterns. If we want signed integers, we must assign those patterns to negative and nonnegative values. Three historical schemes matter.

Sign-magnitude uses the top bit as a sign and the remaining bits as magnitude. For 8 bits, 00000101 is positive 5 and 10000101 is negative 5. The range is negative 127 through positive 127, plus two zeros: 00000000 and 10000000.

Ones' complement represents negative xx by flipping every bit of positive xx. For 8 bits, positive 5 is 00000101, so negative 5 is 11111010. It also has two zeros: 00000000 and 11111111.

Two's complement represents negative xx by computing 2nx2^n - x in nn bits. For 8 bits, negative 5 is 2565=251256 - 5 = 251, which is 11111011. It has one zero and range negative 128 through positive 127.

The usual shortcut for negating in two's complement is invert bits and add 1:

+5       00000101
invert   11111010
add 1    11111011   this is -5

The main hardware advantage is that addition uses the same bit-level adder for signed and unsigned integers. Add 5 and negative 5 as bytes:

  00000101
+ 11111011
----------
1 00000000

The carry out of the top bit is discarded for 8-bit arithmetic, leaving 00000000.

This C snippet prints the byte patterns on a typical implementation with 8-bit bytes and two's complement signed integers:

#include <stdint.h>
#include <stdio.h>

int main(void) {
    int8_t a = 5;
    int8_t b = -5;
    uint8_t ua = (uint8_t)a;
    uint8_t ub = (uint8_t)b;

    printf("+5  = 0x%02X\n", ua);
    printf("-5  = 0x%02X\n", ub);
    printf("sum = 0x%02X\n", (uint8_t)(ua + ub));
}

A run prints:

+5  = 0x05
-5  = 0xFB
sum = 0x00

Overflow, Wrap-Around, and Machine Integers

With a fixed number of bits, arithmetic cannot grow forever. For unsigned 8-bit arithmetic, values are computed modulo 256.

  00000001    1
+ 11111111  255
----------
1 00000000  carry discarded, result 0

So uint8_t addition wraps from 255 to 0. In C, unsigned integer arithmetic is defined modulo 2n2^n for the width after the usual integer conversions. To see exact 8-bit behavior, cast the result back to uint8_t:

#include <stdint.h>
#include <stdio.h>

int main(void) {
    uint8_t x = 255;
    uint8_t y = 1;
    uint8_t z = (uint8_t)(x + y);
    printf("%u + %u -> %u, hex 0x%02X\n", x, y, z, z);
}

Output:

255 + 1 -> 0, hex 0x00

For signed 8-bit two's complement, the bit pattern after adding 127 and 1 is 10000000.

  01111111   127
+ 00000001     1
----------
  10000000

If interpreted as int8_t, 10000000 is negative 128. In C, signed overflow is undefined behavior for the abstract machine, even though common processors produce this bit pattern in hardware. Languages differ. Java specifies two's complement wrap for int and long. Rust checks in debug builds for ordinary signed overflow and wraps only when requested with methods such as wrapping_add.

A common hardware test for signed overflow in addition is: the inputs have the same sign and the result has a different sign. For 127 + 1, both inputs have sign bit 0 and the result has sign bit 1, so signed overflow occurred.

Bytes, Words, and Endianness

A word is the natural integer size for a machine design or instruction set context. On current x86-64 systems, general-purpose registers are 64 bits, but the architecture still uses names such as byte, word, doubleword, and quadword in instruction descriptions.

A multi-byte integer must choose which byte goes at the lower memory address. The integer 0x12345678 has four bytes:

0x12 0x34 0x56 0x78

Little-endian order stores the least significant byte first. x86 and x86-64 use little-endian order for integer data:

address offsetbyte
00x78
10x56
20x34
30x12

Big-endian order stores the most significant byte first:

address offsetbyte
00x12
10x34
20x56
30x78

Network byte order for Internet protocols is big-endian. That is why systems code uses conversion functions such as htonl and ntohl when writing binary integers into packet headers.

#include <arpa/inet.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

int main(void) {
    uint32_t host = 0x12345678;
    uint32_t net = htonl(host);
    unsigned char bytes[4];

    memcpy(bytes, &net, 4);
    printf("%02X %02X %02X %02X\n",
           bytes[0], bytes[1], bytes[2], bytes[3]);
}

On a little-endian x86 machine, this prints:

12 34 56 78

The value was converted before copying, so the bytes in memory match big-endian network order.

Bitwise Operators and Flags

Bitwise operators act on each bit position independently. For the 8-bit inputs x = 0b10101100 and y = 0b11001010:

expressionresult
x & y10001000
`xy`
x ^ y01100110
~x01010011 within 8 bits
x << 210110000 within 8 bits
x >> 200101011 for unsigned right shift

Flags pack several Boolean values into one integer. Suppose bit 0 means readable, bit 1 means writable, and bit 2 means executable.

#include <stdint.h>
#include <stdio.h>

enum {
    FLAG_READ  = 1u << 0,
    FLAG_WRITE = 1u << 1,
    FLAG_EXEC  = 1u << 2
};

int main(void) {
    uint8_t flags = 0;

    flags |= FLAG_READ;          // set read
    flags |= FLAG_WRITE;         // set write
    flags &= (uint8_t)~FLAG_EXEC; // clear exec

    if (flags & FLAG_WRITE) {
        printf("write flag set, flags = 0x%02X\n", flags);
    }

    flags ^= FLAG_WRITE;         // toggle write
    printf("after toggle = 0x%02X\n", flags);
}

The state changes are byte-level:

start            00000000  0x00
set READ         00000001  0x01
set WRITE        00000011  0x03
clear EXEC       00000011  0x03
toggle WRITE     00000001  0x01

Use unsigned types for bit manipulation in C and C++. Left shifting into or past the sign bit of a signed integer has rules that are easy to violate. An unsigned mask such as 1u << k is the usual starting point.

Key Result

Proposition

Fixed Width Addition Is Modular

Statement

For any two nn-bit patterns representing unsigned values aa and bb, the stored sum is (a+b)mod2n(a + b) \bmod 2^n. The same stored bit pattern is also the correct two's complement sum whenever the mathematical signed result lies in the range 2n1-2^{n-1} through 2n112^{n-1}-1.

Intuition

A ripple-carry or carry-lookahead adder computes low-order sum bits the same way regardless of type. Signedness changes how software reads the top bit, not how the low nn sum bits are formed.

Proof Sketch

Write a+b=q2n+ra + b = q2^n + r with 0r<2n0 \leq r < 2^n. Fixed-width hardware stores exactly the low nn bits, which encode rr. Two's complement assigns each bit pattern either its unsigned value rr when r<2n1r < 2^{n-1} or r2nr - 2^n when r2n1r \geq 2^{n-1}. If the exact signed sum is in range, that mapping gives the exact signed sum.

Why It Matters

The same adder circuit serves uint32_t and int32_t. The difference is overflow detection and interpretation, not a separate signed addition unit.

Failure Mode

Do not infer that a language permits signed wrap. The hardware result for int8_t 127 plus 1 is usually 0x80, but C signed overflow is undefined behavior.

A numeric instance with n=8n = 8 is 200+100=300200 + 100 = 300, and 300mod256=44300 \bmod 256 = 44. The stored byte is 00101100, or 0x2C. Interpreted as unsigned it is 44. Interpreted as signed two's complement it is also 44, because the top bit is 0.

For signed values, 100 plus 27 fits and stores 01111111, which is 127. But 100 plus 28 produces 10000000. The stored modular result is valid as bits, while the exact signed result 128 is outside the int8 range.

Common Confusions

Watch Out

Hex is not a third kind of machine storage

Memory stores bits. Hex is a notation for humans. The byte written as 0xAF is the same byte as 10101111 and the same unsigned value as 175.

Watch Out

The top bit is not always a sign bit

The top bit of 0x80 means 128 for uint8_t, negative 128 for int8_t, and simply flag bit 7 for a mask. The type or instruction gives the meaning.

Watch Out

Endianness does not reverse bits inside a byte

Little-endian order reverses byte order for multi-byte values. The byte 0x78 is still stored with the same eight bit positions inside that byte.

Watch Out

Right shift depends on signedness

For unsigned integers, right shift inserts zeros on the left. For negative signed integers in C, the result is implementation-defined. Use unsigned types when you need exact bit movement.

Exercises

ExerciseCore

Problem

Convert decimal 213 to 8-bit binary and hex. Show the weights used.

ExerciseCore

Problem

For 8-bit two's complement, compute the byte pattern for negative 37. Then add it to positive 37 and show the stored byte.

ExerciseAdvanced

Problem

A little-endian machine stores the four bytes EF BE AD DE starting at address 1000. What is the 32-bit unsigned integer value loaded from address 1000? What bytes should be sent on the wire in network byte order for that value?

ExerciseAdvanced

Problem

Given uint8_t flags = 0x2A, where bit 0 is A, bit 1 is B, bit 3 is D, and bit 5 is F, compute the result of setting A, clearing D, and toggling F. Give binary and hex after each step.

References

Canonical:

  • Charles Petzold, Code: The Hidden Language of Computer Hardware and Software, 2nd ed. (2022), ch. 7-9, binary arithmetic, bytes, and logic circuits
  • Andrew S. Tanenbaum and Todd Austin, Structured Computer Organization, 6th ed. (2013), ch. 2, data representation and computer arithmetic
  • David A. Patterson and John L. Hennessy, Computer Organization and Design RISC-V Edition, 2nd ed. (2021), §2.4 and §2.5, signed and unsigned numbers plus arithmetic
  • Randal E. Bryant and David R. O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (2016), ch. 2, integer representations, bit operations, and byte ordering
  • Donald E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd ed. (1997), §4.1, positional number systems and arithmetic

Accessible:

  • Eric Roberts, The Art and Science of C, ch. 5, integer types and bitwise operators
  • Yale N. Patt and Sanjay J. Patel, Introduction to Computing Systems, 3rd ed., ch. 2, bits, data types, and two's complement
  • Henry S. Warren Jr., Hacker's Delight, 2nd ed., ch. 2, bit manipulation identities and examples

Next Topics

  • /computationpath/boolean-logic-and-gates
  • /computationpath/character-encodings
  • /computationpath/floating-point
  • /computationpath/machine-words-and-memory
  • /topics/modular-arithmetic