Why This Matters
An Ethernet frame commonly carries at most 1500 bytes of IP payload. If an application writes 4096 bytes to a TCP socket, that data crosses several boundaries before the NIC transmits it: TCP segmentation, IP packetization, link-layer framing, checksums, route lookup, and possibly offload in hardware.
Layering is why the same HTTP server can run over Wi-Fi, Ethernet, or a loopback interface without recompiling the application. It is also why debugging networks requires care. A timeout seen by an application might come from DNS, TCP retransmission, an MTU black hole, a firewall rule, a NAT binding, or TLS certificate validation.
Core Definitions
Protocol layer
A protocol layer is a module boundary that offers a service to the layer above and uses a service from the layer below. The boundary hides representation details where possible. For example, TCP offers an ordered byte stream to an application while using IP datagrams underneath.
Encapsulation
Encapsulation is the process of placing one protocol data unit inside another. An HTTP request becomes TCP payload, the TCP segment becomes IP payload, and the IP packet becomes Ethernet payload.
MTU
The maximum transmission unit, or MTU, is the largest payload a link-layer frame can carry for a given network-layer protocol. Ethernet commonly has an IPv4 MTU of 1500 bytes, so an IPv4 packet larger than 1500 bytes cannot cross that link without fragmentation or segmentation before transmission.
PDU
A protocol data unit is the named unit at a layer. Ethernet uses frames, IP uses packets or datagrams, TCP uses segments, and applications define messages such as HTTP requests.
The OSI Seven-Layer Reference Model
The OSI model is a teaching model, not the protocol suite deployed on the Internet. Its seven names are still useful because they give engineers shared labels for failure location.
| Layer | Name | Canonical example | Main unit |
|---|---|---|---|
| 7 | Application | HTTP, DNS, SMTP | message |
| 6 | Presentation | TLS record encryption, ASN.1, UTF-8 | encoded message |
| 5 | Session | RPC session, login session | dialog |
| 4 | Transport | TCP, UDP, SCTP | segment or datagram |
| 3 | Network | IPv4, IPv6, ICMP | packet |
| 2 | Data link | Ethernet, Wi-Fi, ARP-related LAN behavior | frame |
| 1 | Physical | 1000BASE-T, fiber, radio | bits and symbols |
Layer 1 turns bits into physical signals. Layer 2 frames data for one local link and uses link-local addresses such as Ethernet MAC addresses. Layer 3 moves packets across networks using IP addresses. Layer 4 gives process-to-process communication, usually with ports. Layers 5 through 7 are not cleanly separated in most Internet software. TLS, JSON encoding, HTTP, gRPC, and authentication often live in one application library.
The engineering value is independent change. Ethernet can move from copper to fiber while IP routing remains the same. IPv4 and IPv6 can carry TCP without changing the TCP state machine. An HTTP server can use TCP today and QUIC tomorrow while much of the handler code remains unchanged.
The Deployed TCP/IP Model
Most operating systems and textbooks use a four-layer model:
| TCP/IP layer | Rough OSI mapping | Examples |
|---|---|---|
| Link | OSI 1 and 2 | Ethernet, Wi-Fi, loopback, VLAN |
| Internet | OSI 3 | IPv4, IPv6, ICMP |
| Transport | OSI 4 | TCP, UDP, SCTP, QUIC's UDP substrate |
| Application | OSI 5 to 7 | HTTP, DNS, SSH, TLS, NTP |
Some engineers split the link layer into physical and data-link, giving a five-layer model. That split is useful when debugging NIC speed negotiation, optical links, Wi-Fi modulation, or checksum offload.
The kernel typically owns link, IP, and TCP/UDP. A user process calls the socket API. For a TCP connection, the process sees file-descriptor operations such as connect, read, write, and close, while the kernel manages sequence numbers, retransmission timers, congestion control, receive windows, and packet output.
// Minimal TCP client shape. Error handling omitted for space.
int fd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in peer = {0};
peer.sin_family = AF_INET;
peer.sin_port = htons(80);
inet_pton(AF_INET, "93.184.216.34", &peer.sin_addr);
connect(fd, (struct sockaddr *)&peer, sizeof(peer));
const char req[] =
"GET / HTTP/1.1\r\n"
"Host: example.com\r\n"
"Connection: close\r\n\r\n";
write(fd, req, sizeof(req) - 1);
That write does not send an HTTP packet. It appends bytes to a TCP send buffer. TCP decides how to segment those bytes, IP decides the route, and the link layer emits frames.
Encapsulation by Bytes
Take this HTTP request:
GET / HTTP/1.1\r\nHost: example.com\r\n\r\n
It is 37 bytes if counted as ASCII bytes. Hexadecimal:
47 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0d 0a
48 6f 73 74 3a 20 65 78 61 6d 70 6c 65 2e 63 6f
6d 0d 0a 0d 0a
A minimal IPv4-over-TCP-over-Ethernet frame, without TCP options or VLAN tags, has these headers:
| Component | Bytes |
|---|---|
| Ethernet header | 14 |
| IPv4 header | 20 |
| TCP header | 20 |
| HTTP payload | 37 |
| Ethernet frame check sequence | 4 |
The transmitted frame from destination MAC through FCS is 95 bytes. Ethernet also has preamble and inter-frame gap on the wire, but those are not part of the frame delivered to the OS.
A simplified byte layout:
Ethernet
dst mac: 6 bytes
src mac: 6 bytes
ethertype: 2 bytes 0x0800 for IPv4
IPv4
version/IHL: 1 byte 0x45 means IPv4, 20-byte header
total length: 2 bytes 0x004d means 77 bytes
protocol: 1 byte 0x06 means TCP
src ip: 4 bytes
dst ip: 4 bytes
TCP
src port: 2 bytes
dst port: 2 bytes 0x0050 for port 80
seq number: 4 bytes
ack number: 4 bytes
data offset: 4 bits 5 means 20-byte header
flags: bits ACK and PSH are common here
checksum: 2 bytes
HTTP
payload: 37 bytes
The IPv4 total length is only the IP packet length: 20 bytes of IPv4 header plus 20 bytes of TCP header plus 37 bytes of TCP payload, so . Ethernet's 14-byte header and 4-byte FCS are outside that field.
MTU, Segmentation, and Fragmentation
On a 1500-byte Ethernet MTU, the largest TCP payload in one IPv4 packet with no IP or TCP options is:
This number is the TCP maximum segment size often advertised for Ethernet paths over IPv4. For IPv6 with a 40-byte base header, the analogous value is:
If an application writes 4096 bytes to a TCP socket over IPv4 Ethernet with MSS 1460, TCP can split it into three segments:
segment 1 payload: 1460 bytes
segment 2 payload: 1460 bytes
segment 3 payload: 1176 bytes
total: 4096 bytes
IP fragmentation is different. Fragmentation splits one IP datagram into multiple IP fragments. IPv4 routers can fragment when the Don't Fragment bit is not set. IPv6 routers do not fragment transit packets; the source must size packets correctly, using ICMPv6 Packet Too Big feedback.
Path MTU discovery probes the largest packet that can cross the full path. In IPv4, a sender sets Don't Fragment. If a router cannot forward a packet because the next-link MTU is smaller, it returns ICMP Destination Unreachable, fragmentation needed, with MTU information when available. A broken firewall that drops this ICMP can create an MTU black hole: small packets pass, large packets stall.
Example: Ethernet MTU 1500 at the sender, PPPoE link MTU 1492 in the middle. A 1500-byte IPv4 packet with DF set reaches the PPPoE hop. The router cannot forward it. Correct behavior is an ICMP error reporting 1492. The TCP sender then uses MSS .
Common Protocols by Layer
Layer 2 protocols include Ethernet, 802.11 Wi-Fi, VLAN tagging, and ARP-adjacent LAN behavior. ARP maps an IPv4 address on a local network to a MAC address. It does not route across the Internet. A host sending to its default gateway first needs the gateway's MAC address on the local link.
Layer 3 protocols include IPv4, IPv6, and ICMP. IP handles addressing and forwarding. ICMP reports control information such as TTL exceeded, echo request and reply, and packet-too-big errors. A router decrements TTL or hop limit at each hop; when it reaches zero, the router drops the packet and often sends ICMP Time Exceeded.
Layer 4 protocols include TCP and UDP. TCP provides an ordered byte stream with retransmission, congestion control, and flow control. UDP provides datagrams with ports and a checksum but no built-in retransmission or ordering. DNS often uses UDP for small queries and TCP for large responses or zone transfers.
Layer 7 protocols include HTTP, DNS, SSH, SMTP, and NTP. The line between layer 6 and layer 7 is blurry in deployed systems. TLS records encrypt application bytes and authenticate peers, but many applications configure TLS directly and treat it as part of their application protocol.
Where Layering Leaks
NAT modifies IP addresses and often transport ports. A TCP connection from 10.0.0.7:51514 to 93.184.216.34:443 might leave the home router as 198.51.100.9:40001 to 93.184.216.34:443. The NAT device must inspect layer 4 ports to rewrite layer 3 addresses. It keeps state, so the mapping can expire even when both endpoints still have sockets.
TLS termination moves encryption boundaries. If a load balancer terminates TLS, the client has a TLS session with the load balancer, not with the origin server behind it. The backend link might use plain HTTP, a separate TLS session, or a local Unix socket. Application logs and certificate identity must be interpreted with that boundary in mind.
QUIC fuses transport and application concerns above UDP. It implements streams, retransmission, flow control, congestion control, and TLS 1.3 handshake integration in user space. Middleboxes see UDP packets, while endpoints run a transport protocol that competes with TCP in function. The four-layer model still names the pieces, but the implementation boundary changed.
Virtual networking adds another leak. Linux network namespaces give processes separate interface lists, routing tables, firewall rules, and loopback devices. A container's eth0 is often one end of a veth pair, with the other end attached to a bridge in the host namespace. Packets cross layers and namespaces before reaching a physical NIC.
container ns host ns
----------- ----------------
eth0 10.0.3.2 <== veth pair ==> vethX
|
bridge br0
|
physical eth0
Key Result
The central invariant of layering is service replacement under a stable interface. If layer exposes the same service contract upward, layer should not depend on whether layer used Ethernet, Wi-Fi, IPv4, IPv6, TCP, or QUIC internally.
Two formulas anchor the byte costs:
These formulas are small, but they prevent common mistakes. Ethernet bytes are not counted in the IPv4 total-length field. TCP payload size is not the same as application write size. MTU is a path property, more than just a NIC property.
Common Confusions
Thinking OSI layer 5 and layer 6 are separate libraries in most Internet programs
Most deployed software does not have a clean session layer and presentation layer. A web service often has HTTP parsing, TLS configuration, compression, authentication, and serialization in one process. The OSI names still help classify duties, but they are not a required implementation layout.
Calling every unit a packet
A frame is link-layer, a packet is network-layer, a segment is TCP, and a message is application-layer. Engineers often say packet casually. In byte accounting, the distinction matters because each unit has different headers and length fields.
Assuming TCP preserves write boundaries
TCP is a byte stream. One write(fd, buf, 100) can be read as 40 and 60 bytes, or combined with another write and read as 200 bytes. Application protocols need delimiters, fixed lengths, or framing fields.
Exercises
Problem
An HTTP request has 900 bytes of payload. It is sent over TCP, IPv4, and Ethernet with no IP or TCP options and no VLAN tag. Compute the IPv4 total length and the Ethernet frame length excluding preamble and inter-frame gap but including FCS.
Problem
A host has Ethernet MTU 1500. The path contains a tunnel link with effective MTU 1400. For IPv4 TCP with no options, what MSS should the sender use after path MTU discovery succeeds? How many TCP segments carry a 5000-byte application write at that MSS?
Problem
A NAT rewrites an outbound flow from inside host 10.1.2.3:53000 to public address 203.0.113.10:41001. The server is 198.51.100.20:443. Write the 4-tuples before NAT, after NAT on the outbound packet, before NAT on the inbound reply, and after NAT on the inbound reply.
References
Canonical:
- W. Richard Stevens and Kevin R. Fall, TCP/IP Illustrated, Volume 1: The Protocols (2nd ed., 2011), ch. 1-3 and 10-13, covers layering, IP, TCP, UDP, and link behavior
- Andrew S. Tanenbaum and David J. Wetherall, Computer Networks (5th ed., 2011), ch. 1 and 3-6, covers the OSI reference model and the Internet protocol suite
- W. Richard Stevens, Bill Fenner, and Andrew M. Rudoff, UNIX Network Programming, Volume 1: The Sockets Networking API (3rd ed., 2004), ch. 1-2, 4, and 6, covers sockets and TCP client-server structure
- James F. Kurose and Keith W. Ross, Computer Networking: A Top-Down Approach (8th ed., 2021), ch. 1-5, covers application, transport, network, and link layers
- Douglas E. Comer, Internetworking with TCP/IP, Volume 1 (6th ed., 2013), ch. 3-8, covers IP addressing, forwarding, ARP, UDP, and TCP
Accessible:
- Larry L. Peterson and Bruce S. Davie, Computer Networks: A Systems Approach, open online edition
- Beej Jorgensen, Beej's Guide to Network Programming
- IBM, TCP/IP Tutorial and Technical Overview, Redbooks
Next Topics
- /computationpath/tcp-state-machine
- /computationpath/ip-routing-and-subnets
- /computationpath/network-namespaces-and-veth
- /computationpath/tls-and-certificates
- /computationpath/quic-transport-over-udp