Skip to main content

Applied ML

Spiking Neural Networks

Discrete-event neuron models trained with surrogate gradients. Energy-efficient on neuromorphic hardware, but rarely competitive with ANNs on standard benchmarks.

AdvancedTier 3CurrentReference~15 min

Why This Matters

Standard artificial neurons emit continuous activations every forward pass. Biological neurons emit binary spikes asynchronously and stay quiet otherwise. Spiking neural networks (SNNs) preserve that sparsity: a neuron contributes energy only when it fires. On event-driven neuromorphic chips like Intel Loihi 2 and SpiNNaker 2, this asymmetry yields one to three orders of magnitude lower inference energy than a comparable ANN running on a GPU, especially for streaming sensor data from event cameras.

The catch is training. Spike functions are non-differentiable, so vanilla backprop does not apply. A decade of progress (surrogate gradients, ANN-to-SNN conversion, time-to-first-spike coding) has narrowed but not closed the accuracy gap on static-image benchmarks like ImageNet. SNNs remain the right tool when the substrate is event-driven, the power budget is tight, or the input is intrinsically temporal. They are usually the wrong tool when you have a GPU and a static dataset.

Core Ideas

Leaky integrate-and-fire (LIF). The canonical neuron integrates input current I(t)I(t) into a membrane potential V(t)V(t) that leaks toward rest with time constant τm\tau_m:

τmdVdt=(VVrest)+RI(t).\tau_m \frac{dV}{dt} = -(V - V_{\text{rest}}) + R\, I(t).

When VV crosses threshold VthV_{\text{th}}, the neuron emits a spike and resets to VrestV_{\text{rest}}. Discretizing in time gives a recurrent unit with binary output st{0,1}s_t \in \{0, 1\} and hidden state VtV_t.

Surrogate gradients. The spike st=1[Vt>Vth]s_t = \mathbb{1}[V_t > V_{\text{th}}] has zero derivative almost everywhere. Neftci, Mostafa, and Zenke (2019, IEEE Signal Process. Mag. 36(6)) replace the derivative with a smooth surrogate (a fast sigmoid, a triangular pulse) only in the backward pass. The forward pass stays binary, so inference remains spike-driven; the backward pass behaves like training a recurrent net through time.

ANN-to-SNN conversion. Rueckauer et al. (2017, Front. Neurosci. 11) showed that a ReLU ANN can be mapped to a rate-coded SNN by interpreting each ReLU activation as a firing rate and weight-normalizing per layer. Conversion preserves accuracy on CIFAR-10 and ImageNet within a few percent but requires hundreds of timesteps to integrate stable rates, eroding the energy advantage. Direct SNN training tends to need fewer timesteps but more training compute.

Time-coded versus rate-coded. Rate codes encode information in firing rate over a window; they are robust but slow. Temporal codes (time-to-first-spike, phase coding) encode in spike timing and can decide a class in a single spike per neuron. Temporal codes are closer to the biological story and to the energy promise, but harder to train.

Definition

Leaky Integrate-and-Fire Neuron

A leaky integrate-and-fire neuron is a stateful unit whose membrane potential VtV_t integrates incoming current, decays toward rest, emits a binary spike sts_t when it crosses threshold, and then resets. The hidden state is continuous; the communication event is discrete.

Proposition

Event-Driven Efficiency Principle

Statement

If activity is sparse and the hardware only performs work when spikes occur, then the expected inference cost of an SNN scales with the number of emitted spikes rather than with the dense layer width at every timestep.

Intuition

Dense ANNs pay for every activation whether it matters or not. Event-driven SNN hardware pays mostly when a neuron fires, so silence becomes computationally valuable.

Failure Mode

The advantage shrinks when firing rates are high, when conversion requires hundreds of timesteps, or when the model runs on dense GPU kernels that do not exploit sparse events.

ExerciseCore

Problem

Suppose a rate-coded SNN needs 200 timesteps to match an ANN's accuracy, while a temporal-code SNN reaches the same decision in 20 timesteps with the same spike rate per step. Which system has the better energy story, and what assumption did you use?

Common Confusions

Watch Out

Biological plausibility is not benchmark dominance

Biological plausibility and task accuracy are different axes. SNNs match ANN accuracy on small static-image benchmarks but lag on ImageNet, language, and most modern benchmarks. The case for SNNs is energy-per-inference on neuromorphic hardware, not representational power.

Watch Out

Surrogate gradients are useful but not exact gradients

Surrogate gradients are a heuristic that works empirically. The surrogate is not the gradient of the spike, and convergence guarantees from smooth optimization do not carry over directly. Treat them as a useful trick, not a derivation.

References

Foundational neuron models:

  • Hodgkin & Huxley, "A quantitative description of membrane current and its application to conduction and excitation in nerve" (J. Physiol. 117(4), 1952). The original conductance-based spiking model.
  • Maass, "Networks of spiking neurons: The third generation of neural network models" (Neural Networks 10(9), 1997). Computational complexity case for spiking models.
  • Gerstner & Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity (Cambridge, 2002). Canonical textbook treatment of LIF and integrate-and-fire variants.
  • Gerstner, Kistler, Naud & Paninski, Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition (Cambridge, 2014). Modern successor textbook covering LIF networks and learning.

Surrogate-gradient training:

  • Neftci, Mostafa & Zenke, "Surrogate Gradient Learning in Spiking Neural Networks" (IEEE Signal Process. Mag. 36(6), 2019; arXiv:1901.09948). Canonical reference for surrogate-gradient backprop-through-time.
  • Shrestha & Orchard, "SLAYER: Spike Layer Error Reassignment in Time" (NeurIPS 2018; arXiv:1810.08646). An exact-temporal-credit-assignment alternative to vanilla surrogate BPTT.
  • Wunderlich & Pehle, "Event-based backpropagation can compute exact gradients for spiking neural networks" (Sci. Rep. 11, 2021; arXiv:2009.08378). EventProp: gradients without surrogates by exploiting spike-event structure.
  • Eshraghian, Ward, Neftci, Wang, Lenz, Dwivedi, Bennamoun, Jeong & Lu, "Training Spiking Neural Networks Using Lessons From Deep Learning" (Proc. IEEE 111(9), 2023; arXiv:2109.12894). Modern training reference; basis for the snnTorch library.

ANN-to-SNN conversion:

  • Rueckauer, Lungu, Hu, Pfeiffer & Liu, "Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification" (Front. Neurosci. 11, 2017). The canonical rate-coded conversion result.
  • Bu, Fang, Ding, Dai, Yu & Huang, "Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks" (ICLR 2022; arXiv:2303.04347). Conversion at very low timestep counts.

Neuromorphic hardware:

  • Davies et al., "Loihi: A Neuromorphic Manycore Processor with On-Chip Learning" (IEEE Micro 38(1), 2018). The original Loihi architecture paper.
  • Davies et al., "Advancing Neuromorphic Computing with Loihi: A Survey of Results and Outlook" (Proc. IEEE 109(5), 2021). Loihi 2 results and the energy comparison numbers.
  • Furber, Galluppi, Temple & Plana, "The SpiNNaker Project" (Proc. IEEE 102(5), 2014). The SpiNNaker neuromorphic platform.

Modern frontier (2023-2024):

  • Zhou, Zhu, He, Wang, Ma, Zhang, Tian & Yuan, "Spikformer: When Spiking Neural Network Meets Transformer" (ICLR 2023; arXiv:2209.15425). First competitive transformer-style SNN.
  • Yao, Hu, Zhou, Yuan, Tian, Xu & Li, "Spike-driven Transformer" (NeurIPS 2023; arXiv:2307.01694). Direct spike-driven attention without intermediate float operations.
  • Zhu, Zhao, Ororbia, Wang, Wu & Eshraghian, "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" (TMLR 2024; arXiv:2302.13939). GPT-style language model in the SNN regime.

Related Topics

Last reviewed: April 27, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

2