BrainEncoding26 – Research Ideas & Implementation
Ideas grounded in recent literature (2022–2025) for improving MEG encoding models in the BrainEncoding26 Challenge. All implementable in PyTorch. Ordered by effort.
Neural Encoding with Deep Neural Networks
Priority Overview
| Idea | Effort | Expected gain | Status |
|---|---|---|---|
| CLIP / DINOv2 backbone | 2h | High | ⬜ todo |
| Multi-layer feature concatenation | 3h | Medium-high | ⬜ todo |
| Dual-stream foveal + parafoveal | 4h | Medium | ⬜ todo |
| MLP regression head | 4h | Medium | ⬜ todo |
| Subject-specific adapter layers | 1d | High (subject 60) | ⬜ todo |
| Contrastive CLIP alignment | 1d | Research-grade | ⬜ todo |
Idea 1 — CLIP / DINOv2 as Feature Backbone
Replace ResNet-18 with a frozen CLIP ViT-L/14 or DINOv2-L backbone.
📄 Recent paper: Benchetrit, Banville & King (ICLR 2024) — Brain Decoding: Toward Real-Time Reconstruction of Visual Perception (arXiv:2310.19812). Trained MEG encoding/decoding models using DINOv2/CLIP features, achieving 7× improvement over classic linear decoders. Late MEG responses (~150–500 ms) align best with DINOv2; early responses with lower-level features.
Why: CLIP generalises better out-of-distribution — critical for predicting subject 60 who was never in training.
import open_clip, torch
model, _, preprocess = open_clip.create_model_and_transforms(
'ViT-L-14', pretrained='openai'
)
model.eval()
def extract_clip_features(pil_images):
x = torch.stack([preprocess(img) for img in pil_images])
with torch.no_grad():
return model.encode_image(x).numpy() # (N, 768)Feed output into existing FeatureEncoder ridge pipeline — no other changes needed.
Idea 2 — Multi-Layer Feature Concatenation
Concatenate features from early, mid, and late layers of the backbone (e.g. layers 4, 8, 12 of ViT).
📄 Recent paper: Elmoznino et al. (PLOS Comp. Bio. 2024) — High-Performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality. Higher-dimensional internal representations generalise better to held-out stimuli across both monkey IT and human fMRI, explaining why large foundation models outperform compact supervised CNNs.
Why: MEG at 110 ms reflects a blend of low-level (spatial frequency, edges) and mid-level (object parts) processing. A single final layer misses the early components.
hooks, features = {}, {}
def make_hook(name):
def fn(m, inp, out):
features[name] = out.detach()
return fn
# Register on ViT transformer blocks
for i, block in enumerate(model.visual.transformer.resblocks):
if i in [3, 7, 11]:
block.register_forward_hook(make_hook(f'layer_{i}'))
# After forward: concat CLS tokens from each layer
feat = torch.cat([features[k][:, 0, :] for k in sorted(features)], dim=-1)Idea 3 — Dual-Stream Foveal + Parafoveal Encoding
For each fixation, extract two crops: the foveal crop (current fixation) and a parafoveal crop (next saccade target). Feed both through the backbone and concatenate.
📄 Recent paper: Fakche, Hickey & Jensen (J. Neuroscience 2024) — Fast Feature- and Category-Related Parafoveal Previewing Support Free Visual Exploration. In free-viewing MEG, fixation-locked responses at ~110 ms encode both foveal AND parafoveal (saccade-goal) content simultaneously. Both streams contribute independently.
Why: The AVS metadata.csv contains fixation sequences per subject, so next-fixation coordinates are available. Doubling the input captures more of the actual neural signal.
from tbdencoder.data import extract_dva_crop
def dual_crop_features(img, fix_xy, next_fix_xy, dva=2.0):
foveal = extract_dva_crop(img, fix_xy, dva)
parafoveal = extract_dva_crop(img, next_fix_xy, dva)
f1 = backbone(preprocess(foveal).unsqueeze(0))
f2 = backbone(preprocess(parafoveal).unsqueeze(0))
return torch.cat([f1, f2], dim=-1) # 2× feature dimIdea 4 — MLP Regression Head
Replace ridge regression with a 2-layer MLP trained end-to-end on (DNN features → MEG sensors).
📄 Recent paper: Ahmadi, Bellec & Glatard (arXiv 2403.19421, ICLR 2024) — Scaling Up Ridge Regression for Brain Encoding. Ridge remains the strongest generalisation baseline, but MLP heads gain when feature dimensionality is very high (ViT patch tokens without pooling). Always run ridge first as the baseline to beat.
import torch.nn as nn, torch.nn.functional as F
class MLPEncoder(nn.Module):
def __init__(self, d_feat, d_meg=204):
super().__init__()
self.net = nn.Sequential(
nn.Linear(d_feat, 512), nn.GELU(), nn.Dropout(0.2),
nn.Linear(512, d_meg)
)
def forward(self, x):
return self.net(x)
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-2)Use early stopping on a held-out subject split.
Idea 5 — Subject-Specific Adapter Layers
Train a shared encoder on subjects 1–5, add a small linear adapter per subject. For subject 60, initialise as the mean of trained adapters.
📄 Recent paper: Li et al. (NeurIPS 2024) — Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion (arXiv:2403.07721). Contrastive CLIP alignment with per-subject adapters generalises cross-subject without re-training from scratch. Subject-specific layers only need to capture individual sensor topology offsets.
class SubjectAdapter(nn.Module):
def __init__(self, d_in, d_out, n_subjects):
super().__init__()
self.shared = nn.Linear(d_in, 512)
self.adapters = nn.ModuleList([nn.Linear(512, d_out) for _ in range(n_subjects)])
def forward(self, x, subject_id):
h = F.relu(self.shared(x))
return self.adapters[subject_id](h)
# Initialise subject-60 adapter as mean of trained adapters
s60_w = torch.stack([a.weight.data for a in model.adapters]).mean(0)
s60_b = torch.stack([a.bias.data for a in model.adapters]).mean(0)Idea 6 — Contrastive CLIP Alignment
Train the encoding model to map MEG → CLIP image embedding space using contrastive loss, rather than predicting raw sensor amplitudes with MSE.
📄 Recent paper: Ferrante et al. (ICLR 2024 Workshop / 2025 journal) — Towards Neural Foundation Models for Vision (arXiv:2411.09723). CLIP-anchored contrastive alignment of EEG/MEG into shared semantic space enables encoding, decoding, and cross-modal conversion with lightweight modality-specific heads. Freeze CLIP; train only the neural projection.
📄 Recent paper: Piskovskyi et al. (ECCV 2024 Workshops, arXiv:2410.04497) — CLIP-based models generalise better to novel semantic categories in Algonauts 2023 fMRI challenge than task-supervised CNNs.
Why: Predicting relative CLIP similarity is easier than predicting absolute MEG amplitudes, and the shared embedding space is the natural domain for subject generalisation.
loss_fn = nn.CosineEmbeddingLoss()
# meg_proj: (N, 512) — MEG features projected into CLIP space
# clip_feats: (N, 512) — frozen CLIP image embeddings
loss = loss_fn(meg_proj, clip_feats, torch.ones(N, device=device))Notes on OOD Generalisation
📄 Madan et al. (NeurIPS 2024, arXiv:2406.16935) — Benchmarking OOD Generalisation Capabilities of DNN Encoding Models for the Ventral Visual Cortex. Even low-level distribution shifts (hue, contrast) cause models to retain as little as 20% of in-distribution performance. For fixation crops from natural scenes, preprocessing consistency matters.
Practical fix: Normalise fixation crops consistently (ImageNet mean/std), add mild augmentation (colour jitter, contrast norm) during training.
References
- Benchetrit, Banville & King (2024). Brain decoding: Toward real-time reconstruction of visual perception. ICLR 2024. arXiv:2310.19812
- Ferrante et al. (2024/2025). Towards neural foundation models for vision. arXiv:2411.09723
- Elmoznino et al. (2024). High-performing neural network models benefit from high latent dimensionality. PLOS Comp. Bio. doi:10.1371/journal.pcbi.1011792
- Madan et al. (2024). Benchmarking OOD generalisation of DNN encoding models. NeurIPS 2024. arXiv:2406.16935
- Fakche, Hickey & Jensen (2024). Fast feature- and category-related parafoveal previewing. J. Neuroscience.
- Li et al. (2024). Visual decoding and reconstruction via EEG embeddings with guided diffusion. NeurIPS 2024. arXiv:2403.07721
- Ahmadi, Bellec & Glatard (2024). Scaling up ridge regression for brain encoding. arXiv:2403.19421
- Piskovskyi et al. (2024). Generalizability analysis of DL predictions of brain responses. ECCV 2024 Workshops. arXiv:2410.04497
Neural Encoding with Deep Neural Networks
Tags: neuroscience neural-encoding meg research pytorch