Principal Component Analysis (PCA)

→ Linear dimensionality reduction. Finds orthogonal axes that successively capture the most variance in the data.

Definition

PCA rotates the data into a new orthonormal basis where the first axis (PC1) points along the direction of maximum variance, the second axis along the next-largest variance direction perpendicular to PC1, and so on. The components are the eigenvectors of the covariance matrix; the variance along each component is its eigenvalue.

Formally, for a centred data matrix X ∈ ℝ^(n × d):

Covariance: C = (1/(n-1)) Xᵀ X
Eigendecomposition: C = V Λ Vᵀ
Columns of V = principal components; diagonal of Λ = variance explained.
Projection: Z = X V (rows of Z are the data in PC coordinates)
Equivalent via SVD: X = U Σ Vᵀ, with PCs = columns of V, scores = U Σ.

When to use it

Use case	Why PCA helps
Visualising high-dimensional data	Project to 2–3 PCs to plot
Denoising	Drop low-variance PCs (assumed noise)
Decorrelation	PC scores are uncorrelated by construction
Compression	Keep top-k PCs that capture, e.g., 95% variance
Neural population analysis	Reveal low-dimensional manifolds in M1, PFC, hippocampus

In neuroscience — why it matters for the motor system

PCA is the workhorse of modern population-coding analysis. Instead of asking “what does this single neuron code?” (Georgopoulos tuning curves), modern work asks “what trajectories does the whole population draw through state space?” (Churchland, Cunningham, Shenoy).

Churchland 2012: PCA of M1 population activity during reaching reveals rotational dynamics — the population vector traces consistent ellipses in the top 6 PCs, regardless of reach direction. This is hard to explain with directional tuning alone and supports the dynamical systems view of motor cortex.
Mante 2013 (PFC): PCA + targeted dimensionality reduction shows context-dependent gating: same sensory input, different PFC trajectory depending on which feature animal must attend to.
Stringer & Pachitariu 2019: PCA of 10,000+ V1 neurons shows a power-law eigenspectrum — neural codes are higher-dimensional than previously assumed (see High-performing neural network models of visual cortex benefit from high latent dimensionality).

→ Where Georgopoulos asked “what direction does this neuron prefer?”, PCA asks “what does the population do as a whole, and in how few dimensions?”

Mathematical intuition (2 minutes)

Centre your data — subtract the mean.
Find the direction in which the cloud is most stretched. That’s PC1.
Find the next direction perpendicular to PC1 in which the cloud is still stretched. That’s PC2.
Repeat until you’ve used d directions. The eigenvalues tell you how stretched each direction is.

Why eigenvectors of the covariance matrix? Because the covariance matrix encodes the shape of the data cloud, and its eigenvectors are exactly the axes of the data ellipsoid.

Limitations

Linear only. Curved manifolds need t-SNE, UMAP, Isomap, or autoencoders.
Variance ≠ task-relevance. The largest PC may capture nuisance variance (drift, breathing artifacts) rather than the signal you care about. Hence targeted dimensionality reduction (TDR), demixed PCA, and similar.
Scale-sensitive. Always z-score features that live on different scales — otherwise the unit choice dominates.
Components can be uninterpretable. A PC is a linear combination of original features; it may not map onto any concept a domain expert would name.

🐍 PCA for explainability — Python example

This is what I mean by “explainability with PCA”: after running PCA, inspect why each component exists by looking at variance explained, loadings (which features drive each PC), and reconstruction error. Below is a full Pyodide-compatible script using a synthetic M1 population analogue — the same situation that motivates PCA in motor cortex work.

🐍 PCA explainability — synthetic motor cortex population

import micropip
await micropip.install(["matplotlib", "scikit-learn"])
 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
 
rng = np.random.default_rng(0)
 
# ────────────────────────────────────────────────────────────────
# 1. Build a synthetic M1 population
#    50 neurons, cosine-tuned to random preferred directions,
#    recorded across 8 reach directions, 100 time bins per reach.
# ────────────────────────────────────────────────────────────────
N_NEURONS = 50
N_DIRS    = 8
N_TIME    = 100
dirs = np.linspace(0, 2*np.pi, N_DIRS, endpoint=False)
pref = rng.uniform(0, 2*np.pi, size=N_NEURONS)
t    = np.linspace(0, 1, N_TIME)
 
# Gaussian "movement window" envelope, peaks at t=0.5
envelope = np.exp(-((t - 0.5)**2) / (2 * 0.15**2))
 
# data[i, j, n] = firing rate of neuron n at time j during reach i
data = np.zeros((N_DIRS, N_TIME, N_NEURONS))
for i, theta in enumerate(dirs):
    tuning = 10 + 15 * np.cos(theta - pref)             # cosine tuning
    data[i] = envelope[:, None] * tuning[None, :] \
            + rng.normal(0, 1.5, size=(N_TIME, N_NEURONS))
 
# Flatten to (samples, features) for PCA: each row = one time bin
X = data.reshape(-1, N_NEURONS)
 
# ────────────────────────────────────────────────────────────────
# 2. Standardise + fit PCA
# ────────────────────────────────────────────────────────────────
X_z = StandardScaler().fit_transform(X)
pca = PCA().fit(X_z)
 
# ────────────────────────────────────────────────────────────────
# 3. EXPLAINABILITY PLOTS
# ────────────────────────────────────────────────────────────────
fig, axes = plt.subplots(2, 2, figsize=(11, 8), dpi=80)
 
# 3a — Scree plot: how many PCs do we actually need?
evr = pca.explained_variance_ratio_
cum = np.cumsum(evr)
axes[0, 0].bar(range(1, len(evr)+1), evr * 100, color='steelblue', alpha=0.7)
axes[0, 0].plot(range(1, len(evr)+1), cum * 100, 'o-', color='crimson', label='cumulative')
axes[0, 0].axhline(95, color='gray', ls='--', alpha=0.5)
axes[0, 0].text(len(evr)*0.5, 96, '95% threshold', color='gray', fontsize=9)
axes[0, 0].set_xlim(0.5, 15.5)
axes[0, 0].set_xlabel("Principal Component"); axes[0, 0].set_ylabel("Variance explained (%)")
axes[0, 0].set_title("Scree plot — intrinsic dimensionality")
axes[0, 0].legend(); axes[0, 0].grid(alpha=0.3)
 
# 3b — Loadings: which neurons drive PC1 and PC2?
loadings = pca.components_[:2].T  # (N_NEURONS, 2)
order    = np.argsort(pref)        # sort neurons by preferred direction
im = axes[0, 1].imshow(loadings[order].T, aspect='auto', cmap='RdBu_r',
                        vmin=-0.3, vmax=0.3)
axes[0, 1].set_yticks([0, 1]); axes[0, 1].set_yticklabels(['PC1', 'PC2'])
axes[0, 1].set_xlabel("Neuron (sorted by preferred direction)")
axes[0, 1].set_title("Loadings — which neurons each PC weights")
plt.colorbar(im, ax=axes[0, 1], shrink=0.6)
 
# 3c — Project trajectories into PC1–PC2 plane
#     Each reach direction = one looping trajectory
Z = pca.transform(X_z).reshape(N_DIRS, N_TIME, -1)
cmap = plt.cm.hsv(np.linspace(0, 1, N_DIRS))
for i in range(N_DIRS):
    axes[1, 0].plot(Z[i, :, 0], Z[i, :, 1], color=cmap[i], lw=1.5,
                    label=f'{np.degrees(dirs[i]):.0f}°')
    axes[1, 0].scatter(Z[i, 0, 0], Z[i, 0, 1], color=cmap[i], s=30, marker='o')   # start
    axes[1, 0].scatter(Z[i, -1, 0], Z[i, -1, 1], color=cmap[i], s=60, marker='*')  # end
axes[1, 0].set_xlabel(f"PC1 ({evr[0]*100:.1f}% var)")
axes[1, 0].set_ylabel(f"PC2 ({evr[1]*100:.1f}% var)")
axes[1, 0].set_title("Population trajectories in PC space\n(○ = start, ★ = end)")
axes[1, 0].legend(fontsize=7, ncol=2, loc='upper right'); axes[1, 0].grid(alpha=0.3)
axes[1, 0].axhline(0, color='gray', lw=0.5); axes[1, 0].axvline(0, color='gray', lw=0.5)
 
# 3d — Reconstruction error vs. k: how many PCs to keep the signal?
ks = np.arange(1, 21)
recon_err = []
for k in ks:
    Xk = pca.inverse_transform(np.hstack([pca.transform(X_z)[:, :k],
                                          np.zeros((X_z.shape[0], N_NEURONS - k))]))
    recon_err.append(np.mean((X_z - Xk)**2))
axes[1, 1].plot(ks, recon_err, 'o-', color='darkblue')
axes[1, 1].set_xlabel("k (PCs retained)"); axes[1, 1].set_ylabel("Reconstruction MSE")
axes[1, 1].set_title("How many PCs are enough?")
axes[1, 1].grid(alpha=0.3)
 
fig.suptitle("PCA explainability dashboard — synthetic M1 population",
             fontsize=13, fontweight='bold')
plt.tight_layout(rect=[0, 0, 1, 0.95]); plt.show()
 
# ────────────────────────────────────────────────────────────────
# 4. Numeric summary printed under the figure
# ────────────────────────────────────────────────────────────────
print(f"Total dimensions in data: {N_NEURONS}")
print(f"PCs needed for  80% variance: {np.searchsorted(cum, 0.80) + 1}")
print(f"PCs needed for  95% variance: {np.searchsorted(cum, 0.95) + 1}")
print(f"PC1 explains {evr[0]*100:.1f}%  |  PC2 explains {evr[1]*100:.1f}%")
print(f"Top-2 PCs capture {(evr[0]+evr[1])*100:.1f}% of total variance")

🔍 How to read each panel

Scree plot (top-left) — the elbow tells you the intrinsic dimensionality. For this synthetic dataset, ~2 PCs capture most of the variance because the data is fundamentally 2D (a cosine-tuned population reaching in a 2D plane is rank-2 plus noise). The 95% threshold marks a common cutoff for “useful signal vs. noise.”

Loadings (top-right) — each row of the heatmap is a PC; each column is a neuron (sorted by preferred direction). If PC1 is essentially cos(θ_pref) and PC2 is sin(θ_pref), you’ll see one full sinusoid in each row — proof that PC1/PC2 jointly encode movement direction in (x, y) Cartesian form. This is the explainability win: the PCs recover the latent geometry without being told what to look for.

Trajectories (bottom-left) — every reach direction is one coloured loop in PC1–PC2 space. Loops emerge because the envelope rises then falls. Their angular position tells you which direction was reached — i.e. the population vector lives entirely in the top 2 PCs.

Reconstruction error (bottom-right) — confirms the elbow: beyond k ≈ 2–3, adding PCs barely improves reconstruction. Anything past that is noise.

⚠️ Pyodide memory note

Same Pyodide caveat as L4 Figure 6 — if you run multiple matplotlib cells in one session the WASM heap fills up. If you get MemoryError, restart Obsidian (Cmd-R) and run this cell alone.

What “PCA for explainability” really means

The phrase has two flavours — both useful, both worth distinguishing:

Using PCA to understand the data itself (this note). After fitting, inspect:
- Variance explained → intrinsic dimensionality
- Loadings → which original features drive each component
- Trajectories in PC space → geometric structure of the population’s behaviour
- Reconstruction error → signal vs. noise floor
Using PCA to explain a model. PCA can also be a stepping stone for explaining a black-box classifier — e.g. projecting feature attributions or gradients into low-dim space to visualise which directions in input space matter most. That’s a separate use case (more common in computer vision / interpretability research).

For neural data, flavour (1) is the relevant one and is what Churchland, Mante, Stringer, and most systems-neuro papers mean.

Brain Online

Explorer

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Definition

When to use it

In neuroscience — why it matters for the motor system

Mathematical intuition (2 minutes)

Limitations

🐍 PCA for explainability — Python example

What “PCA for explainability” really means

See Also

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis