→ Linear dimensionality reduction. Finds orthogonal axes that successively capture the most variance in the data.
Definition
PCA rotates the data into a new orthonormal basis where the first axis (PC1) points along the direction of maximum variance, the second axis along the next-largest variance direction perpendicular to PC1, and so on. The components are the eigenvectors of the covariance matrix; the variance along each component is its eigenvalue.
Formally, for a centred data matrix X ∈ ℝ^(n × d):
Covariance: C = (1/(n-1)) Xᵀ X
Eigendecomposition: C = V Λ Vᵀ
Columns of V = principal components; diagonal of Λ = variance explained.
Projection: Z = X V (rows of Z are the data in PC coordinates)
Equivalent via SVD:X = U Σ Vᵀ, with PCs = columns of V, scores = U Σ.
When to use it
Use case
Why PCA helps
Visualising high-dimensional data
Project to 2–3 PCs to plot
Denoising
Drop low-variance PCs (assumed noise)
Decorrelation
PC scores are uncorrelated by construction
Compression
Keep top-k PCs that capture, e.g., 95% variance
Neural population analysis
Reveal low-dimensional manifolds in M1, PFC, hippocampus
In neuroscience — why it matters for the motor system
PCA is the workhorse of modern population-coding analysis. Instead of asking “what does this single neuron code?” (Georgopoulos tuning curves), modern work asks “what trajectories does the whole population draw through state space?” (Churchland, Cunningham, Shenoy).
Churchland 2012: PCA of M1 population activity during reaching reveals rotational dynamics — the population vector traces consistent ellipses in the top 6 PCs, regardless of reach direction. This is hard to explain with directional tuning alone and supports the dynamical systems view of motor cortex.
Mante 2013 (PFC): PCA + targeted dimensionality reduction shows context-dependent gating: same sensory input, different PFC trajectory depending on which feature animal must attend to.
→ Where Georgopoulos asked “what direction does this neuron prefer?”, PCA asks “what does the population do as a whole, and in how few dimensions?”
Mathematical intuition (2 minutes)
Centre your data — subtract the mean.
Find the direction in which the cloud is most stretched. That’s PC1.
Find the next direction perpendicular to PC1 in which the cloud is still stretched. That’s PC2.
Repeat until you’ve used d directions. The eigenvalues tell you how stretched each direction is.
Why eigenvectors of the covariance matrix? Because the covariance matrix encodes the shape of the data cloud, and its eigenvectors are exactly the axes of the data ellipsoid.
Limitations
Linear only. Curved manifolds need t-SNE, UMAP, Isomap, or autoencoders.
Variance ≠ task-relevance. The largest PC may capture nuisance variance (drift, breathing artifacts) rather than the signal you care about. Hence targeted dimensionality reduction (TDR), demixed PCA, and similar.
Scale-sensitive. Always z-score features that live on different scales — otherwise the unit choice dominates.
Components can be uninterpretable. A PC is a linear combination of original features; it may not map onto any concept a domain expert would name.
🐍 PCA for explainability — Python example
This is what I mean by “explainability with PCA”: after running PCA, inspect why each component exists by looking at variance explained, loadings (which features drive each PC), and reconstruction error. Below is a full Pyodide-compatible script using a synthetic M1 population analogue — the same situation that motivates PCA in motor cortex work.
🐍 PCA explainability — synthetic motor cortex population
import micropipawait micropip.install(["matplotlib", "scikit-learn"])import numpy as npimport matplotlib.pyplot as pltfrom sklearn.decomposition import PCAfrom sklearn.preprocessing import StandardScalerrng = np.random.default_rng(0)# ────────────────────────────────────────────────────────────────# 1. Build a synthetic M1 population# 50 neurons, cosine-tuned to random preferred directions,# recorded across 8 reach directions, 100 time bins per reach.# ────────────────────────────────────────────────────────────────N_NEURONS = 50N_DIRS = 8N_TIME = 100dirs = np.linspace(0, 2*np.pi, N_DIRS, endpoint=False)pref = rng.uniform(0, 2*np.pi, size=N_NEURONS)t = np.linspace(0, 1, N_TIME)# Gaussian "movement window" envelope, peaks at t=0.5envelope = np.exp(-((t - 0.5)**2) / (2 * 0.15**2))# data[i, j, n] = firing rate of neuron n at time j during reach idata = np.zeros((N_DIRS, N_TIME, N_NEURONS))for i, theta in enumerate(dirs): tuning = 10 + 15 * np.cos(theta - pref) # cosine tuning data[i] = envelope[:, None] * tuning[None, :] \ + rng.normal(0, 1.5, size=(N_TIME, N_NEURONS))# Flatten to (samples, features) for PCA: each row = one time binX = data.reshape(-1, N_NEURONS)# ────────────────────────────────────────────────────────────────# 2. Standardise + fit PCA# ────────────────────────────────────────────────────────────────X_z = StandardScaler().fit_transform(X)pca = PCA().fit(X_z)# ────────────────────────────────────────────────────────────────# 3. EXPLAINABILITY PLOTS# ────────────────────────────────────────────────────────────────fig, axes = plt.subplots(2, 2, figsize=(11, 8), dpi=80)# 3a — Scree plot: how many PCs do we actually need?evr = pca.explained_variance_ratio_cum = np.cumsum(evr)axes[0, 0].bar(range(1, len(evr)+1), evr * 100, color='steelblue', alpha=0.7)axes[0, 0].plot(range(1, len(evr)+1), cum * 100, 'o-', color='crimson', label='cumulative')axes[0, 0].axhline(95, color='gray', ls='--', alpha=0.5)axes[0, 0].text(len(evr)*0.5, 96, '95% threshold', color='gray', fontsize=9)axes[0, 0].set_xlim(0.5, 15.5)axes[0, 0].set_xlabel("Principal Component"); axes[0, 0].set_ylabel("Variance explained (%)")axes[0, 0].set_title("Scree plot — intrinsic dimensionality")axes[0, 0].legend(); axes[0, 0].grid(alpha=0.3)# 3b — Loadings: which neurons drive PC1 and PC2?loadings = pca.components_[:2].T # (N_NEURONS, 2)order = np.argsort(pref) # sort neurons by preferred directionim = axes[0, 1].imshow(loadings[order].T, aspect='auto', cmap='RdBu_r', vmin=-0.3, vmax=0.3)axes[0, 1].set_yticks([0, 1]); axes[0, 1].set_yticklabels(['PC1', 'PC2'])axes[0, 1].set_xlabel("Neuron (sorted by preferred direction)")axes[0, 1].set_title("Loadings — which neurons each PC weights")plt.colorbar(im, ax=axes[0, 1], shrink=0.6)# 3c — Project trajectories into PC1–PC2 plane# Each reach direction = one looping trajectoryZ = pca.transform(X_z).reshape(N_DIRS, N_TIME, -1)cmap = plt.cm.hsv(np.linspace(0, 1, N_DIRS))for i in range(N_DIRS): axes[1, 0].plot(Z[i, :, 0], Z[i, :, 1], color=cmap[i], lw=1.5, label=f'{np.degrees(dirs[i]):.0f}°') axes[1, 0].scatter(Z[i, 0, 0], Z[i, 0, 1], color=cmap[i], s=30, marker='o') # start axes[1, 0].scatter(Z[i, -1, 0], Z[i, -1, 1], color=cmap[i], s=60, marker='*') # endaxes[1, 0].set_xlabel(f"PC1 ({evr[0]*100:.1f}% var)")axes[1, 0].set_ylabel(f"PC2 ({evr[1]*100:.1f}% var)")axes[1, 0].set_title("Population trajectories in PC space\n(○ = start, ★ = end)")axes[1, 0].legend(fontsize=7, ncol=2, loc='upper right'); axes[1, 0].grid(alpha=0.3)axes[1, 0].axhline(0, color='gray', lw=0.5); axes[1, 0].axvline(0, color='gray', lw=0.5)# 3d — Reconstruction error vs. k: how many PCs to keep the signal?ks = np.arange(1, 21)recon_err = []for k in ks: Xk = pca.inverse_transform(np.hstack([pca.transform(X_z)[:, :k], np.zeros((X_z.shape[0], N_NEURONS - k))])) recon_err.append(np.mean((X_z - Xk)**2))axes[1, 1].plot(ks, recon_err, 'o-', color='darkblue')axes[1, 1].set_xlabel("k (PCs retained)"); axes[1, 1].set_ylabel("Reconstruction MSE")axes[1, 1].set_title("How many PCs are enough?")axes[1, 1].grid(alpha=0.3)fig.suptitle("PCA explainability dashboard — synthetic M1 population", fontsize=13, fontweight='bold')plt.tight_layout(rect=[0, 0, 1, 0.95]); plt.show()# ────────────────────────────────────────────────────────────────# 4. Numeric summary printed under the figure# ────────────────────────────────────────────────────────────────print(f"Total dimensions in data: {N_NEURONS}")print(f"PCs needed for 80% variance: {np.searchsorted(cum, 0.80) + 1}")print(f"PCs needed for 95% variance: {np.searchsorted(cum, 0.95) + 1}")print(f"PC1 explains {evr[0]*100:.1f}% | PC2 explains {evr[1]*100:.1f}%")print(f"Top-2 PCs capture {(evr[0]+evr[1])*100:.1f}% of total variance")
🔍 How to read each panel
Scree plot (top-left) — the elbow tells you the intrinsic dimensionality. For this synthetic dataset, ~2 PCs capture most of the variance because the data is fundamentally 2D (a cosine-tuned population reaching in a 2D plane is rank-2 plus noise). The 95% threshold marks a common cutoff for “useful signal vs. noise.”
Loadings (top-right) — each row of the heatmap is a PC; each column is a neuron (sorted by preferred direction). If PC1 is essentially cos(θ_pref) and PC2 is sin(θ_pref), you’ll see one full sinusoid in each row — proof that PC1/PC2 jointly encode movement direction in (x, y) Cartesian form. This is the explainability win: the PCs recover the latent geometry without being told what to look for.
Trajectories (bottom-left) — every reach direction is one coloured loop in PC1–PC2 space. Loops emerge because the envelope rises then falls. Their angular position tells you which direction was reached — i.e. the population vector lives entirely in the top 2 PCs.
Reconstruction error (bottom-right) — confirms the elbow: beyond k ≈ 2–3, adding PCs barely improves reconstruction. Anything past that is noise.
⚠️ Pyodide memory note
Same Pyodide caveat as L4 Figure 6 — if you run multiple matplotlib cells in one session the WASM heap fills up. If you get MemoryError, restart Obsidian (Cmd-R) and run this cell alone.
What “PCA for explainability” really means
The phrase has two flavours — both useful, both worth distinguishing:
Using PCA to understand the data itself (this note). After fitting, inspect:
Variance explained → intrinsic dimensionality
Loadings → which original features drive each component
Trajectories in PC space → geometric structure of the population’s behaviour
Reconstruction error → signal vs. noise floor
Using PCA to explain a model. PCA can also be a stepping stone for explaining a black-box classifier — e.g. projecting feature attributions or gradients into low-dim space to visualise which directions in input space matter most. That’s a separate use case (more common in computer vision / interpretability research).
For neural data, flavour (1) is the relevant one and is what Churchland, Mante, Stringer, and most systems-neuro papers mean.