Python for ML Research

Sneiderman, Robby

Infrastructure

Python for ML Research

Reference card for the Python ML stack: package managers, framework choices, editor setup, and common pitfalls.

CoreTier 3CurrentReference~12 min

Prereq Map

What It Is

Python has been the default language of ML research since roughly 2015, when the combination of NumPy, scikit-learn, and Theano displaced MATLAB for academic deep-learning work. PyTorch (Meta, released 2016) and TensorFlow (Google, 2015) cemented the position. The language is interpreted, dynamically typed, single-threaded by default at the bytecode level (the GIL), and almost universally bound to C, C++, or Rust for any tensor or compute-heavy operation.

A practical ML stack is layered: a numerical kernel (NumPy / PyTorch / JAX) wrapped by domain libraries (Hugging Face Transformers, scikit-learn, Lightning) wrapped by experiment scaffolding (Hydra configs, wandb logging, Modal or SLURM job submission). Python serves as glue; the heavy work happens in compiled kernels.

Package management has consolidated around uv (Astral, 2024) as the new default for most greenfield ML projects. uv is a Rust reimplementation of pip and virtualenv that is roughly 10-100x faster than pip for environment creation, with a pyproject.toml-first workflow. Conda still dominates anywhere CUDA / cuDNN system bindings need to be reproducible across platforms (genomics, scientific Python). Poetry has lost ground to uv since 2024. Plain pip + venv remains the lowest-common-denominator option.

When You'd Use It

PyTorch is the dominant research framework: Papers With Code numbers from 2024 put PyTorch at roughly 75-80% of new research-paper implementations, JAX around 10-15%, TensorFlow under 10% and falling. JAX wins for projects that need explicit functional transforms (jit, vmap, pmap, grad) or work at Google / DeepMind / Anthropic; PyTorch wins everywhere else. TensorFlow remains in production at companies that adopted it pre-2020.

The editor stack: VS Code with the Pylance extension is the most common setup. Cursor (a VS Code fork with stronger LLM integration) has gained share since 2024 among solo researchers. Jupyter notebooks remain dominant for exploration, but most production training code now lives in plain .py files invoked by a launcher; long-lived notebooks are an anti-pattern for anything that will be re-run.

For type checking, pyright (Microsoft, used by Pylance) is faster than mypy and has better inference, but mypy still has the larger plugin ecosystem (e.g. mypy --strict is a common CI gate).

Definition

Python ML Stack

The Python ML stack is the layered system that turns research code into model runs: interpreter, environment manager, numerical kernel, framework, configuration layer, experiment logger, and job launcher. Most speed comes from the compiled kernel; most reproducibility failures come from the surrounding glue.

Proposition

Compiled Kernel Boundary

Statement

For tensor-heavy ML code, Python is usually not the performance bottleneck when the hot path stays inside compiled kernels.

Intuition

Python schedules the operation, but BLAS, CUDA, XLA, or ATen performs the numerical work. The expensive mistake is crossing the boundary thousands of times with tiny operations.

Failure Mode

The proposition fails for Python loops over examples, per-token object manipulation, data-loading bottlenecks, and any code that repeatedly moves tensors between CPU and GPU.

report a correction →

ExerciseCore

Problem

A training script is slow. The model forward pass is vectorized in PyTorch, but preprocessing loops through examples in Python and calls .item() inside the loop. Which boundary is being crossed too often?

Notable Gotchas

Watch Out

Mutable default arguments

def f(x, cache=[]): shares the same list across every call to f. The default object is created once at function definition. Use cache=None and initialize inside the body. This bites every Python programmer at least once.

Watch Out

GIL, threading, and multiprocessing

The Global Interpreter Lock means Python threads cannot execute pure-Python bytecode in parallel. Threads still help for I/O (network, disk, subprocess waits) because the GIL is released during blocking syscalls. For CPU-bound parallelism use multiprocessing or run inside a C extension that releases the GIL (NumPy, PyTorch DataLoader workers). Python 3.13 introduced an experimental no-GIL build but production ML is still on the GIL-default interpreter as of 2026.

References

Python Language Reference (https://docs.python.org/3/reference/).
uv Documentation (https://docs.astral.sh/uv/).
PyTorch Documentation (https://pytorch.org/docs/stable/index.html).
JAX Documentation (https://jax.readthedocs.io/en/latest/).
Pyright Documentation (https://microsoft.github.io/pyright/).
Beazley, D. "Python Cookbook," 3rd ed., Chapters 8 (Classes and Objects) and 12 (Concurrency).

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

0

No direct prerequisites are declared; this is treated as an entry point.

Derived topics

1

Reproducibility and Experimental Rigorlayer 2 · tier 2

Graph-backed continuations

Reproducibility and Experimental Rigor