Principal Component Analysis (PCA)

→ Linear dimensionality reduction. Finds orthogonal axes that successively capture the most variance in the data.

Definition

PCA rotates the data into a new orthonormal basis where the first axis (PC1) points along the direction of maximum variance, the second axis along the next-largest variance direction perpendicular to PC1, and so on. The components are the eigenvectors of the covariance matrix; the variance along each component is its eigenvalue.

Formally, for a centred data matrix X ∈ ℝ^(n × d):

  • Covariance: C = (1/(n-1)) Xᵀ X
  • Eigendecomposition: C = V Λ Vᵀ
  • Columns of V = principal components; diagonal of Λ = variance explained.
  • Projection: Z = X V (rows of Z are the data in PC coordinates)
  • Equivalent via SVD: X = U Σ Vᵀ, with PCs = columns of V, scores = U Σ.

When to use it

Use caseWhy PCA helps
Visualising high-dimensional dataProject to 2–3 PCs to plot
DenoisingDrop low-variance PCs (assumed noise)
DecorrelationPC scores are uncorrelated by construction
CompressionKeep top-k PCs that capture, e.g., 95% variance
Neural population analysisReveal low-dimensional manifolds in M1, PFC, hippocampus

In neuroscience — why it matters for the motor system

PCA is the workhorse of modern population-coding analysis. Instead of asking “what does this single neuron code?” (Georgopoulos tuning curves), modern work asks “what trajectories does the whole population draw through state space?” (Churchland, Cunningham, Shenoy).

  • Churchland 2012: PCA of M1 population activity during reaching reveals rotational dynamics — the population vector traces consistent ellipses in the top 6 PCs, regardless of reach direction. This is hard to explain with directional tuning alone and supports the dynamical systems view of motor cortex.
  • Mante 2013 (PFC): PCA + targeted dimensionality reduction shows context-dependent gating: same sensory input, different PFC trajectory depending on which feature animal must attend to.
  • Stringer & Pachitariu 2019: PCA of 10,000+ V1 neurons shows a power-law eigenspectrum — neural codes are higher-dimensional than previously assumed (see High-performing neural network models of visual cortex benefit from high latent dimensionality).

→ Where Georgopoulos asked “what direction does this neuron prefer?”, PCA asks “what does the population do as a whole, and in how few dimensions?”

Mathematical intuition (2 minutes)

  1. Centre your data — subtract the mean.
  2. Find the direction in which the cloud is most stretched. That’s PC1.
  3. Find the next direction perpendicular to PC1 in which the cloud is still stretched. That’s PC2.
  4. Repeat until you’ve used d directions. The eigenvalues tell you how stretched each direction is.

Why eigenvectors of the covariance matrix? Because the covariance matrix encodes the shape of the data cloud, and its eigenvectors are exactly the axes of the data ellipsoid.

Limitations

  • Linear only. Curved manifolds need t-SNE, UMAP, Isomap, or autoencoders.
  • Variance ≠ task-relevance. The largest PC may capture nuisance variance (drift, breathing artifacts) rather than the signal you care about. Hence targeted dimensionality reduction (TDR), demixed PCA, and similar.
  • Scale-sensitive. Always z-score features that live on different scales — otherwise the unit choice dominates.
  • Components can be uninterpretable. A PC is a linear combination of original features; it may not map onto any concept a domain expert would name.

🐍 PCA for explainability — Python example

This is what I mean by “explainability with PCA”: after running PCA, inspect why each component exists by looking at variance explained, loadings (which features drive each PC), and reconstruction error. Below is a full Pyodide-compatible script using a synthetic M1 population analogue — the same situation that motivates PCA in motor cortex work.

What “PCA for explainability” really means

The phrase has two flavours — both useful, both worth distinguishing:

  1. Using PCA to understand the data itself (this note). After fitting, inspect:

    • Variance explained → intrinsic dimensionality
    • Loadings → which original features drive each component
    • Trajectories in PC space → geometric structure of the population’s behaviour
    • Reconstruction error → signal vs. noise floor
  2. Using PCA to explain a model. PCA can also be a stepping stone for explaining a black-box classifier — e.g. projecting feature attributions or gradients into low-dim space to visualise which directions in input space matter most. That’s a separate use case (more common in computer vision / interpretability research).

For neural data, flavour (1) is the relevant one and is what Churchland, Mante, Stringer, and most systems-neuro papers mean.


See Also

Tags: methods machinelearning population-coding
Superlinks: 050 🧠Neuroscience · 611 📠Machine Learning

Where PCA shows up in my notes:

Related methods to know:

  • t-SNE, UMAP — non-linear alternatives for visualisation
  • ICA — finds independent (not just uncorrelated) components; better for source separation
  • Factor analysis — assumes a generative noise model, PCA does not
  • demixed PCA, TDR — task-aware variants used in systems neuroscience