Longitudinal Surveys and Panel Data

Sneiderman, Robby

Statistical Foundations

Longitudinal Surveys and Panel Data

Analysis of data where the same units are measured repeatedly over time: fixed effects, random effects, difference-in-differences, and the problems of attrition and time-varying confounding.

AdvancedTier 3StableSupporting~50 min

Prerequisites

Linear Regression Causal Inference for Policy Evaluation Nonresponse and Missing Data Small Area Estimation

Prereq Map

Why This Matters

Cross-sectional data gives you a snapshot: differences between people at one point in time. Longitudinal data gives you a movie: changes within the same person over time. This distinction is critical for causal inference because cross-sectional differences confound within-person changes with between-person differences.

If you observe that people who exercise more earn more, is it because exercise increases earnings, or because healthier people (who exercise more) also tend to be better educated? Cross-sectional data cannot separate these explanations. Longitudinal data can, by tracking the same person over time and asking: when this person starts exercising more, do their earnings change?

Mental Model

You observe $N$ units (people, firms, countries) at $T$ time points. The data is $\{y_{it}, x_{it}\}$ for unit $i = 1, \ldots, N$ and time $t = 1, \ldots, T$ . Each unit has unobserved characteristics $\alpha_i$ (ability, motivation, genetics) that are constant over time but vary across units. The question is how to handle these unobserved unit-specific effects.

Core Definitions

Definition

Panel Data

Panel data (also called longitudinal data) consists of observations on the same set of units across multiple time periods. A balanced panel has observations for all $N$ units at all $T$ time periods ( $NT$ observations total). An unbalanced panel has some missing observations due to attrition, late entry, or intermittent nonresponse.

Definition

Cross-Sectional vs. Longitudinal Design

A cross-sectional design samples different units at each time point. It can track population-level changes but cannot identify individual-level changes. A longitudinal design follows the same units over time. It can separate within-unit change from between-unit differences.

Repeated cross-sections (like the Current Population Survey) sample different people each month. Panel surveys (like the PSID or NLSY) follow the same people for years or decades.

The Panel Data Model

The standard linear panel data model is:

$y_{it} = x_{it}^T \beta + \alpha_i + \epsilon_{it}$

where $y_{it}$ is the outcome for unit $i$ at time $t$ , $x_{it}$ are observed time-varying covariates, $\alpha_i$ is the unobserved unit-specific effect, and $\epsilon_{it}$ is the idiosyncratic error with $\mathbb{E}[\epsilon_{it} \mid x_{i1}, \ldots, x_{iT}, \alpha_i] = 0$ .

The central question: is $\alpha_i$ correlated with $x_{it}$ ?

Fixed Effects

Definition

Fixed Effects Model

The fixed effects (FE) model treats $\alpha_i$ as an arbitrary unit-specific constant that may be correlated with $x_{it}$ . Estimation proceeds by removing $\alpha_i$ through the within transformation: subtract the unit mean from each variable.

$y_{it} - \bar{y}_i = (x_{it} - \bar{x}_i)^T \beta + (\epsilon_{it} - \bar{\epsilon}_i)$

where $\bar{y}_i = \frac{1}{T}\sum_t y_{it}$ . This "demeans" the data, eliminating $\alpha_i$ . OLS on the demeaned data gives the within estimator $\hat{\beta}_{\text{FE}}$ .

Random Effects

Definition

Random Effects Model

The random effects (RE) model treats $\alpha_i$ as a random variable that is exogenous with respect to the entire covariate history: $\mathbb{E}[\alpha_i \mid x_{i1}, \ldots, x_{iT}] = 0$ (often paired with $\mathbb{E}[\epsilon_{it} \mid x_{i1}, \ldots, x_{iT}, \alpha_i] = 0$ ). The shorthand $\text{Cov}(\alpha_i, x_{it}) = 0$ is too weak as a stand-alone condition: zero contemporaneous covariance does not rule out dependence between $\alpha_i$ and the history of covariates, nonlinear dependence, or correlation with future or past covariates — any of which can break RE consistency. Under the conditional-mean exogeneity condition, the model is a linear model with a compound error $\alpha_i + \epsilon_{it}$ , and GLS exploits the error structure to produce an estimator more efficient than FE (using both within and between variation).

The RE estimator is a matrix-weighted average of the within (FE) and between estimators. It is more efficient than FE when the RE assumption holds, but inconsistent when it does not.

Main Theorems

Theorem

Consistency of the Fixed Effects Estimator

Statement

Under the panel model $y_{it} = x_{it}^T\beta + \alpha_i + \epsilon_{it}$ with strict exogeneity $\mathbb{E}[\epsilon_{it} \mid x_{i1}, \ldots, x_{iT}, \alpha_i] = 0$ and $\text{rank}(\sum_t \mathbb{E}[\ddot{x}_{it}\ddot{x}_{it}^T]) = k$ (where $\ddot{x}_{it} = x_{it} - \bar{x}_i$ ), the within estimator is consistent for $\beta$ as $N \to \infty$ with $T$ fixed:

$\hat{\beta}_{\text{FE}} = \left(\sum_{i=1}^N \sum_{t=1}^T \ddot{x}_{it}\ddot{x}_{it}^T\right)^{-1} \sum_{i=1}^N \sum_{t=1}^T \ddot{x}_{it}\ddot{y}_{it} \xrightarrow{p} \beta$

This holds regardless of whether $\alpha_i$ is correlated with $x_{it}$ .

Intuition

By subtracting unit means, the within transformation removes all time-invariant confounders (observed or unobserved). What remains is purely within-unit variation: how changes in $x_{it}$ for a given unit $i$ relate to changes in $y_{it}$ for that same unit. This eliminates selection bias due to time-invariant unobservables.

Proof Sketch

After the within transformation, $\ddot{y}_{it} = \ddot{x}_{it}^T\beta + \ddot{\epsilon}_{it}$ . Since $\alpha_i$ has been differenced out, OLS on the demeaned equation identifies $\beta$ . By the law of large numbers (applied as $N \to \infty$ ), $\hat{\beta}_{\text{FE}} \to \beta$ because $\mathbb{E}[\ddot{x}_{it}\ddot{\epsilon}_{it}] = 0$ follows from strict exogeneity.

Why It Matters

Fixed effects identification is a workhorse of applied economics and social science. It controls for all time-invariant confounders without needing to observe or measure them. This is why panel data is so valuable for causal inference: if the confounders are fixed characteristics of units, FE eliminates them.

Failure Mode

FE cannot identify the effect of time-invariant variables (gender, race, country of birth) because these are absorbed into $\alpha_i$ . FE requires strict exogeneity, which fails with lagged dependent variables ( $x_{it}$ includes $y_{i,t-1}$ ) or feedback effects. With small $T$ , the incidental parameters problem biases nonlinear FE models (logit, Poisson). FE is also inefficient if $\alpha_i$ is actually uncorrelated with $x_{it}$ , in which case RE is better.

report a correction →

Difference-in-Differences

Definition

Difference-in-Differences (DiD)

Difference-in-differences is a method for estimating causal effects from panel data with a treatment that affects some units but not others at a specific time. With two periods ( $t = 0, 1$ ) and two groups (treated, control):

$\hat{\delta}_{\text{DiD}} = (\bar{y}_{1,\text{treated}} - \bar{y}_{0,\text{treated}}) - (\bar{y}_{1,\text{control}} - \bar{y}_{0,\text{control}})$

The first difference removes unit-specific time-invariant confounders. The second difference removes common time trends. The identifying assumption is parallel trends: absent treatment, the treated and control groups would have had the same time trend.

DiD is equivalent to fixed effects with a treatment dummy in a two-period, two-group setting. It generalizes to multiple periods and staggered treatment adoption, though recent research shows the generalization requires care (see de Chaisemartin & D'Haultfoeuille, 2020).

Staggered Difference-in-Differences

When treatment is adopted at different times across units, the classic two-way fixed effects (TWFE) estimator with a treatment dummy is biased under heterogeneous treatment effects. Goodman-Bacon (2021) decomposed the TWFE estimand into a weighted average of all possible $2 \times 2$ DiD comparisons, including comparisons that use already-treated units as "controls" for later-treated units. When treatment effects change over time or across cohorts, these forbidden comparisons contaminate the estimate and can produce the wrong sign.

The post-2021 literature provides robust estimators that avoid this problem:

Callaway & Sant'Anna (2021) estimate group-time average treatment effects $\text{ATT}(g, t)$ for each cohort $g$ and period $t$ , then aggregate with transparent weights.
Sun & Abraham (2021) correct event-study specifications by using an interaction-weighted estimator that isolates cohort-specific dynamic effects.
de Chaisemartin & D'Haultfoeuille (2020) propose the $\text{DID}_M$ estimator based on instantaneous treatment switchers.
Borusyak, Jaravel & Spiess (2024) develop an imputation estimator: fit unit and time effects using untreated observations only, impute counterfactuals, and average treatment-minus-counterfactual differences. Efficient under homoskedasticity.

All four estimators agree with TWFE under homogeneous effects but diverge when treatment effects vary by cohort or over time since adoption.

Synthetic Control Methods

When only a single unit (or a few units) is treated and pre-treatment periods are long, synthetic control constructs a weighted combination of control units whose pre-treatment trajectory matches the treated unit. The treatment effect is the gap between the treated unit and its synthetic counterpart in the post-treatment period.

Abadie & Gardeazabal (2003) introduced the method to study the economic cost of terrorism in the Basque Country. Abadie, Diamond & Hainmueller (2010) formalized it and studied California's tobacco control program. Weights $w_i \geq 0$ with $\sum_i w_i = 1$ are chosen to minimize pre-treatment outcome discrepancy, optionally also matching pre-treatment covariates. Inference uses placebo tests: apply the method to each control unit and compare the treated gap to the distribution of placebo gaps. Abadie (2021) gives a comprehensive survey including identifying assumptions, extensions to multiple treated units, and the relationship to DiD and matrix completion.

Clustered Standard Errors

Panel data violates iid sampling: observations within the same unit are correlated over time, and treatment often varies only at the unit level. Standard errors must cluster at the level of treatment assignment (typically the panel unit, or state/firm when policy varies at that level). Failure to cluster can understate standard errors by factors of 2 to 10.

Cameron & Miller (2015) is the canonical practitioner reference. The key diagnostic is the number of clusters $G$ . The standard cluster-robust variance estimator is consistent as $G \to \infty$ , but in finite samples it performs poorly when $G$ is small. For $G$ below roughly 30 to 50, use the wild cluster bootstrap (Cameron, Gelbach & Miller 2008) or subcluster bootstraps. With two-way clustering (by unit and by time), $G$ is effectively the minimum of the two dimensions.

Attrition

Attrition is the defining practical problem of longitudinal studies. People move, die, refuse to participate, or become unreachable. If attrition is related to the outcome, the remaining sample is not representative of the original sample.

Diagnosing attrition: compare baseline characteristics of stayers vs. leavers. If they differ, attrition is selective on those observed characteristics — but this is a diagnostic, not a proof. Equality of baseline means does not establish that attrition is ignorable (attrition can still depend on unobserved outcomes, future shocks, latent health, motivation, or income instability), and unequal baseline means do not by themselves prove bias in the target estimator after appropriate adjustment. The substantive question is whether missingness is ignorable conditional on the observed information used by the estimator. Common corrections include inverse probability weighting (weight remaining observations by the inverse of their estimated probability of staying), multiple imputation, and selection models (Heckman 1979) that explicitly model the missingness mechanism.

Major Panel Surveys

PSID (Panel Study of Income Dynamics): U.S. families, since 1968. The longest running household panel survey in the world.
NLSY (National Longitudinal Survey of Youth): two cohorts (1979, 1997) of U.S. youth tracked into adulthood.
LISS (Longitudinal Internet Studies for the Social Sciences): Dutch probability-based internet panel.
BHPS/Understanding Society: UK households, now part of the UK Household Longitudinal Study.
SOEP (German Socio-Economic Panel): German households since 1984.

Common Confusions

Watch Out

Fixed effects does not mean the effects are fixed

The name is confusing. "Fixed effects" means the unit-specific intercepts $\alpha_i$ are treated as fixed (non-random) parameters. It does not mean the regression coefficients $\beta$ are fixed or non-varying. The alternative, "random effects," treats $\alpha_i$ as draws from a distribution.

Watch Out

The Hausman test is not a test of whether to use FE or RE

The Hausman test checks whether the RE and FE estimates are statistically different. If they are, this suggests $\text{Cov}(\alpha_i, x_{it}) \neq 0$ and RE is inconsistent. But a non-significant Hausman test does not prove $\text{Cov}(\alpha_i, x_{it}) = 0$ . It may just lack power. In practice, if you have reason to believe there are unobserved confounders correlated with regressors, use FE regardless of the Hausman test.

Watch Out

Panel data does not automatically solve endogeneity

FE controls for time-invariant confounders. It does not control for time-varying confounders. If an omitted variable changes over time and is correlated with $x_{it}$ , FE does not eliminate the bias. Panel data helps, but it is not a cure-all for endogeneity.

Summary

Panel data tracks the same units over time, enabling within-unit comparisons
Fixed effects removes all time-invariant confounders by demeaning
Random effects is more efficient but requires $\alpha_i$ uncorrelated with $x_{it}$
Difference-in-differences uses parallel trends to identify causal effects
Attrition is the major practical threat: dropouts are rarely random
FE cannot identify effects of time-invariant variables

Exercises

ExerciseCore

Problem

You have a panel of 500 workers observed over 5 years. You regress log wages on years of education using OLS, FE, and RE. The OLS coefficient is 0.10, the RE coefficient is 0.08, and the FE coefficient is 0.04. Interpret the differences. Why is the FE estimate smallest?

ExerciseAdvanced

Problem

A policy is implemented in state A in 2020 but not in state B. Average outcomes are: State A pre-2020: 50, State A post-2020: 58, State B pre-2020: 45, State B post-2020: 48. Compute the DiD estimate. State the parallel trends assumption in plain English. Give one reason it might fail.

References

Canonical (panel data textbooks):

Wooldridge, Econometric Analysis of Cross Section and Panel Data, 2nd ed. (2010), MIT Press, Chapters 10-14 (fixed effects, random effects, dynamic panels, GMM)
Hsiao, Analysis of Panel Data, 3rd ed. (2014), Cambridge University Press, Chapters 2-4 and 7
Baltagi, Econometric Analysis of Panel Data, 6th ed. (2021), Springer, Chapters 2-4 and 8
Diggle, Heagerty, Liang & Zeger, Analysis of Longitudinal Data, 2nd ed. (2013), Oxford University Press (biostatistics, GEE, mixed effects)
Fitzmaurice, Laird & Ware, Applied Longitudinal Analysis, 2nd ed. (2011), Wiley
Angrist & Pischke, Mostly Harmless Econometrics (2009), Princeton University Press, Chapter 5

Staggered DiD (current):

Goodman-Bacon, "Difference-in-Differences with Variation in Treatment Timing" (2021), Journal of Econometrics 225(2)
Callaway & Sant'Anna, "Difference-in-Differences with Multiple Time Periods" (2021), Journal of Econometrics 225(2)
Sun & Abraham, "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects" (2021), Journal of Econometrics 225(2)
Borusyak, Jaravel & Spiess, "Revisiting Event Study Designs: Robust and Efficient Estimation" (2024), Review of Economic Studies 91(6)
de Chaisemartin & D'Haultfoeuille, "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects" (2020), American Economic Review 110(9)

Synthetic control:

Abadie & Gardeazabal, "The Economic Costs of Conflict: A Case Study of the Basque Country" (2003), American Economic Review 93(1)
Abadie, Diamond & Hainmueller, "Synthetic Control Methods for Comparative Case Studies" (2010), Journal of the American Statistical Association 105(490)
Abadie, "Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects" (2021), Journal of Economic Literature 59(2)

Clustered standard errors:

Cameron & Miller, "A Practitioner's Guide to Cluster-Robust Inference" (2015), Journal of Human Resources 50(2)
Cameron, Gelbach & Miller, "Bootstrap-Based Improvements for Inference with Clustered Errors" (2008), Review of Economics and Statistics 90(3)

Next Topics

Small area estimation: borrowing strength across subpopulations
Nonresponse and missing data: handling attrition formally

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Linear Regressionlayer 1 · tier 1
Nonresponse and Missing Datalayer 2 · tier 2
Causal Inference for Policy Evaluationlayer 4 · tier 2
Small Area Estimationlayer 3 · tier 3

Derived topics

0

No published topic currently declares this as a prerequisite.