Statistical Foundations
Longitudinal Surveys and Panel Data
Analysis of data where the same units are measured repeatedly over time: fixed effects, random effects, difference-in-differences, and the problems of attrition and time-varying confounding.
Prerequisites
Why This Matters
Cross-sectional data gives you a snapshot: differences between people at one point in time. Longitudinal data gives you a movie: changes within the same person over time. This distinction is critical for causal inference because cross-sectional differences confound within-person changes with between-person differences.
If you observe that people who exercise more earn more, is it because exercise increases earnings, or because healthier people (who exercise more) also tend to be better educated? Cross-sectional data cannot separate these explanations. Longitudinal data can, by tracking the same person over time and asking: when this person starts exercising more, do their earnings change?
Mental Model
You observe units (people, firms, countries) at time points. The data is for unit and time . Each unit has unobserved characteristics (ability, motivation, genetics) that are constant over time but vary across units. The question is how to handle these unobserved unit-specific effects.
Core Definitions
Panel Data
Panel data (also called longitudinal data) consists of observations on the same set of units across multiple time periods. A balanced panel has observations for all units at all time periods ( observations total). An unbalanced panel has some missing observations due to attrition, late entry, or intermittent nonresponse.
Cross-Sectional vs. Longitudinal Design
A cross-sectional design samples different units at each time point. It can track population-level changes but cannot identify individual-level changes. A longitudinal design follows the same units over time. It can separate within-unit change from between-unit differences.
Repeated cross-sections (like the Current Population Survey) sample different people each month. Panel surveys (like the PSID or NLSY) follow the same people for years or decades.
The Panel Data Model
The standard linear panel data model is:
where is the outcome for unit at time , are observed time-varying covariates, is the unobserved unit-specific effect, and is the idiosyncratic error with .
The central question: is correlated with ?
Fixed Effects
Fixed Effects Model
The fixed effects (FE) model treats as an arbitrary unit-specific constant that may be correlated with . Estimation proceeds by removing through the within transformation: subtract the unit mean from each variable.
where . This "demeans" the data, eliminating . OLS on the demeaned data gives the within estimator .
Random Effects
Random Effects Model
The random effects (RE) model treats as a random variable that is exogenous with respect to the entire covariate history: (often paired with ). The shorthand is too weak as a stand-alone condition: zero contemporaneous covariance does not rule out dependence between and the history of covariates, nonlinear dependence, or correlation with future or past covariates — any of which can break RE consistency. Under the conditional-mean exogeneity condition, the model is a linear model with a compound error , and GLS exploits the error structure to produce an estimator more efficient than FE (using both within and between variation).
The RE estimator is a matrix-weighted average of the within (FE) and between estimators. It is more efficient than FE when the RE assumption holds, but inconsistent when it does not.
Main Theorems
Consistency of the Fixed Effects Estimator
Statement
Under the panel model with strict exogeneity and (where ), the within estimator is consistent for as with fixed:
This holds regardless of whether is correlated with .
Intuition
By subtracting unit means, the within transformation removes all time-invariant confounders (observed or unobserved). What remains is purely within-unit variation: how changes in for a given unit relate to changes in for that same unit. This eliminates selection bias due to time-invariant unobservables.
Proof Sketch
After the within transformation, . Since has been differenced out, OLS on the demeaned equation identifies . By the law of large numbers (applied as ), because follows from strict exogeneity.
Why It Matters
Fixed effects identification is a workhorse of applied economics and social science. It controls for all time-invariant confounders without needing to observe or measure them. This is why panel data is so valuable for causal inference: if the confounders are fixed characteristics of units, FE eliminates them.
Failure Mode
FE cannot identify the effect of time-invariant variables (gender, race, country of birth) because these are absorbed into . FE requires strict exogeneity, which fails with lagged dependent variables ( includes ) or feedback effects. With small , the incidental parameters problem biases nonlinear FE models (logit, Poisson). FE is also inefficient if is actually uncorrelated with , in which case RE is better.
Difference-in-Differences
Difference-in-Differences (DiD)
Difference-in-differences is a method for estimating causal effects from panel data with a treatment that affects some units but not others at a specific time. With two periods () and two groups (treated, control):
The first difference removes unit-specific time-invariant confounders. The second difference removes common time trends. The identifying assumption is parallel trends: absent treatment, the treated and control groups would have had the same time trend.
DiD is equivalent to fixed effects with a treatment dummy in a two-period, two-group setting. It generalizes to multiple periods and staggered treatment adoption, though recent research shows the generalization requires care (see de Chaisemartin & D'Haultfoeuille, 2020).
Staggered Difference-in-Differences
When treatment is adopted at different times across units, the classic two-way fixed effects (TWFE) estimator with a treatment dummy is biased under heterogeneous treatment effects. Goodman-Bacon (2021) decomposed the TWFE estimand into a weighted average of all possible DiD comparisons, including comparisons that use already-treated units as "controls" for later-treated units. When treatment effects change over time or across cohorts, these forbidden comparisons contaminate the estimate and can produce the wrong sign.
The post-2021 literature provides robust estimators that avoid this problem:
- Callaway & Sant'Anna (2021) estimate group-time average treatment effects for each cohort and period , then aggregate with transparent weights.
- Sun & Abraham (2021) correct event-study specifications by using an interaction-weighted estimator that isolates cohort-specific dynamic effects.
- de Chaisemartin & D'Haultfoeuille (2020) propose the estimator based on instantaneous treatment switchers.
- Borusyak, Jaravel & Spiess (2024) develop an imputation estimator: fit unit and time effects using untreated observations only, impute counterfactuals, and average treatment-minus-counterfactual differences. Efficient under homoskedasticity.
All four estimators agree with TWFE under homogeneous effects but diverge when treatment effects vary by cohort or over time since adoption.
Synthetic Control Methods
When only a single unit (or a few units) is treated and pre-treatment periods are long, synthetic control constructs a weighted combination of control units whose pre-treatment trajectory matches the treated unit. The treatment effect is the gap between the treated unit and its synthetic counterpart in the post-treatment period.
Abadie & Gardeazabal (2003) introduced the method to study the economic cost of terrorism in the Basque Country. Abadie, Diamond & Hainmueller (2010) formalized it and studied California's tobacco control program. Weights with are chosen to minimize pre-treatment outcome discrepancy, optionally also matching pre-treatment covariates. Inference uses placebo tests: apply the method to each control unit and compare the treated gap to the distribution of placebo gaps. Abadie (2021) gives a comprehensive survey including identifying assumptions, extensions to multiple treated units, and the relationship to DiD and matrix completion.
Clustered Standard Errors
Panel data violates iid sampling: observations within the same unit are correlated over time, and treatment often varies only at the unit level. Standard errors must cluster at the level of treatment assignment (typically the panel unit, or state/firm when policy varies at that level). Failure to cluster can understate standard errors by factors of 2 to 10.
Cameron & Miller (2015) is the canonical practitioner reference. The key diagnostic is the number of clusters . The standard cluster-robust variance estimator is consistent as , but in finite samples it performs poorly when is small. For below roughly 30 to 50, use the wild cluster bootstrap (Cameron, Gelbach & Miller 2008) or subcluster bootstraps. With two-way clustering (by unit and by time), is effectively the minimum of the two dimensions.
Attrition
Attrition is the defining practical problem of longitudinal studies. People move, die, refuse to participate, or become unreachable. If attrition is related to the outcome, the remaining sample is not representative of the original sample.
Diagnosing attrition: compare baseline characteristics of stayers vs. leavers. If they differ, attrition is selective on those observed characteristics — but this is a diagnostic, not a proof. Equality of baseline means does not establish that attrition is ignorable (attrition can still depend on unobserved outcomes, future shocks, latent health, motivation, or income instability), and unequal baseline means do not by themselves prove bias in the target estimator after appropriate adjustment. The substantive question is whether missingness is ignorable conditional on the observed information used by the estimator. Common corrections include inverse probability weighting (weight remaining observations by the inverse of their estimated probability of staying), multiple imputation, and selection models (Heckman 1979) that explicitly model the missingness mechanism.
Major Panel Surveys
- PSID (Panel Study of Income Dynamics): U.S. families, since 1968. The longest running household panel survey in the world.
- NLSY (National Longitudinal Survey of Youth): two cohorts (1979, 1997) of U.S. youth tracked into adulthood.
- LISS (Longitudinal Internet Studies for the Social Sciences): Dutch probability-based internet panel.
- BHPS/Understanding Society: UK households, now part of the UK Household Longitudinal Study.
- SOEP (German Socio-Economic Panel): German households since 1984.
Common Confusions
Fixed effects does not mean the effects are fixed
The name is confusing. "Fixed effects" means the unit-specific intercepts are treated as fixed (non-random) parameters. It does not mean the regression coefficients are fixed or non-varying. The alternative, "random effects," treats as draws from a distribution.
The Hausman test is not a test of whether to use FE or RE
The Hausman test checks whether the RE and FE estimates are statistically different. If they are, this suggests and RE is inconsistent. But a non-significant Hausman test does not prove . It may just lack power. In practice, if you have reason to believe there are unobserved confounders correlated with regressors, use FE regardless of the Hausman test.
Panel data does not automatically solve endogeneity
FE controls for time-invariant confounders. It does not control for time-varying confounders. If an omitted variable changes over time and is correlated with , FE does not eliminate the bias. Panel data helps, but it is not a cure-all for endogeneity.
Summary
- Panel data tracks the same units over time, enabling within-unit comparisons
- Fixed effects removes all time-invariant confounders by demeaning
- Random effects is more efficient but requires uncorrelated with
- Difference-in-differences uses parallel trends to identify causal effects
- Attrition is the major practical threat: dropouts are rarely random
- FE cannot identify effects of time-invariant variables
Exercises
Problem
You have a panel of 500 workers observed over 5 years. You regress log wages on years of education using OLS, FE, and RE. The OLS coefficient is 0.10, the RE coefficient is 0.08, and the FE coefficient is 0.04. Interpret the differences. Why is the FE estimate smallest?
Problem
A policy is implemented in state A in 2020 but not in state B. Average outcomes are: State A pre-2020: 50, State A post-2020: 58, State B pre-2020: 45, State B post-2020: 48. Compute the DiD estimate. State the parallel trends assumption in plain English. Give one reason it might fail.
References
Canonical (panel data textbooks):
- Wooldridge, Econometric Analysis of Cross Section and Panel Data, 2nd ed. (2010), MIT Press, Chapters 10-14 (fixed effects, random effects, dynamic panels, GMM)
- Hsiao, Analysis of Panel Data, 3rd ed. (2014), Cambridge University Press, Chapters 2-4 and 7
- Baltagi, Econometric Analysis of Panel Data, 6th ed. (2021), Springer, Chapters 2-4 and 8
- Diggle, Heagerty, Liang & Zeger, Analysis of Longitudinal Data, 2nd ed. (2013), Oxford University Press (biostatistics, GEE, mixed effects)
- Fitzmaurice, Laird & Ware, Applied Longitudinal Analysis, 2nd ed. (2011), Wiley
- Angrist & Pischke, Mostly Harmless Econometrics (2009), Princeton University Press, Chapter 5
Staggered DiD (current):
- Goodman-Bacon, "Difference-in-Differences with Variation in Treatment Timing" (2021), Journal of Econometrics 225(2)
- Callaway & Sant'Anna, "Difference-in-Differences with Multiple Time Periods" (2021), Journal of Econometrics 225(2)
- Sun & Abraham, "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects" (2021), Journal of Econometrics 225(2)
- Borusyak, Jaravel & Spiess, "Revisiting Event Study Designs: Robust and Efficient Estimation" (2024), Review of Economic Studies 91(6)
- de Chaisemartin & D'Haultfoeuille, "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects" (2020), American Economic Review 110(9)
Synthetic control:
- Abadie & Gardeazabal, "The Economic Costs of Conflict: A Case Study of the Basque Country" (2003), American Economic Review 93(1)
- Abadie, Diamond & Hainmueller, "Synthetic Control Methods for Comparative Case Studies" (2010), Journal of the American Statistical Association 105(490)
- Abadie, "Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects" (2021), Journal of Economic Literature 59(2)
Clustered standard errors:
- Cameron & Miller, "A Practitioner's Guide to Cluster-Robust Inference" (2015), Journal of Human Resources 50(2)
- Cameron, Gelbach & Miller, "Bootstrap-Based Improvements for Inference with Clustered Errors" (2008), Review of Economics and Statistics 90(3)
Next Topics
- Small area estimation: borrowing strength across subpopulations
- Nonresponse and missing data: handling attrition formally
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
4- Linear Regressionlayer 1 · tier 1
- Nonresponse and Missing Datalayer 2 · tier 2
- Causal Inference for Policy Evaluationlayer 4 · tier 2
- Small Area Estimationlayer 3 · tier 3
Derived topics
0No published topic currently declares this as a prerequisite.