BrainEncoding26 – Research Ideas & Implementation

Ideas grounded in recent literature (2022–2025) for improving MEG encoding models in the BrainEncoding26 Challenge. All implementable in PyTorch. Ordered by effort.

Neural Encoding with Deep Neural Networks

Priority Overview

Idea	Effort	Expected gain	Status
CLIP / DINOv2 backbone	2h	High	⬜ todo
Multi-layer feature concatenation	3h	Medium-high	⬜ todo
Dual-stream foveal + parafoveal	4h	Medium	⬜ todo
MLP regression head	4h	Medium	⬜ todo
Subject-specific adapter layers	1d	High (subject 60)	⬜ todo
Contrastive CLIP alignment	1d	Research-grade	⬜ todo

Idea 1 — CLIP / DINOv2 as Feature Backbone

Replace ResNet-18 with a frozen CLIP ViT-L/14 or DINOv2-L backbone.

📄 Recent paper: Benchetrit, Banville & King (ICLR 2024) — Brain Decoding: Toward Real-Time Reconstruction of Visual Perception (arXiv:2310.19812). Trained MEG encoding/decoding models using DINOv2/CLIP features, achieving 7× improvement over classic linear decoders. Late MEG responses (~150–500 ms) align best with DINOv2; early responses with lower-level features.

Why: CLIP generalises better out-of-distribution — critical for predicting subject 60 who was never in training.

import open_clip, torch
 
model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-L-14', pretrained='openai'
)
model.eval()
 
def extract_clip_features(pil_images):
    x = torch.stack([preprocess(img) for img in pil_images])
    with torch.no_grad():
        return model.encode_image(x).numpy()  # (N, 768)

Feed output into existing FeatureEncoder ridge pipeline — no other changes needed.

Idea 2 — Multi-Layer Feature Concatenation

Concatenate features from early, mid, and late layers of the backbone (e.g. layers 4, 8, 12 of ViT).

📄 Recent paper: Elmoznino et al. (PLOS Comp. Bio. 2024) — High-Performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality. Higher-dimensional internal representations generalise better to held-out stimuli across both monkey IT and human fMRI, explaining why large foundation models outperform compact supervised CNNs.

Why: MEG at 110 ms reflects a blend of low-level (spatial frequency, edges) and mid-level (object parts) processing. A single final layer misses the early components.

hooks, features = {}, {}
 
def make_hook(name):
    def fn(m, inp, out):
        features[name] = out.detach()
    return fn
 
# Register on ViT transformer blocks
for i, block in enumerate(model.visual.transformer.resblocks):
    if i in [3, 7, 11]:
        block.register_forward_hook(make_hook(f'layer_{i}'))
 
# After forward: concat CLS tokens from each layer
feat = torch.cat([features[k][:, 0, :] for k in sorted(features)], dim=-1)

Idea 3 — Dual-Stream Foveal + Parafoveal Encoding

For each fixation, extract two crops: the foveal crop (current fixation) and a parafoveal crop (next saccade target). Feed both through the backbone and concatenate.

📄 Recent paper: Fakche, Hickey & Jensen (J. Neuroscience 2024) — Fast Feature- and Category-Related Parafoveal Previewing Support Free Visual Exploration. In free-viewing MEG, fixation-locked responses at ~110 ms encode both foveal AND parafoveal (saccade-goal) content simultaneously. Both streams contribute independently.

Why: The AVS metadata.csv contains fixation sequences per subject, so next-fixation coordinates are available. Doubling the input captures more of the actual neural signal.

from tbdencoder.data import extract_dva_crop
 
def dual_crop_features(img, fix_xy, next_fix_xy, dva=2.0):
    foveal    = extract_dva_crop(img, fix_xy, dva)
    parafoveal = extract_dva_crop(img, next_fix_xy, dva)
    f1 = backbone(preprocess(foveal).unsqueeze(0))
    f2 = backbone(preprocess(parafoveal).unsqueeze(0))
    return torch.cat([f1, f2], dim=-1)  # 2× feature dim

Idea 4 — MLP Regression Head

Replace ridge regression with a 2-layer MLP trained end-to-end on (DNN features → MEG sensors).

📄 Recent paper: Ahmadi, Bellec & Glatard (arXiv 2403.19421, ICLR 2024) — Scaling Up Ridge Regression for Brain Encoding. Ridge remains the strongest generalisation baseline, but MLP heads gain when feature dimensionality is very high (ViT patch tokens without pooling). Always run ridge first as the baseline to beat.

import torch.nn as nn, torch.nn.functional as F
 
class MLPEncoder(nn.Module):
    def __init__(self, d_feat, d_meg=204):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_feat, 512), nn.GELU(), nn.Dropout(0.2),
            nn.Linear(512, d_meg)
        )
    def forward(self, x):
        return self.net(x)
 
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-2)

Use early stopping on a held-out subject split.

Idea 5 — Subject-Specific Adapter Layers

Train a shared encoder on subjects 1–5, add a small linear adapter per subject. For subject 60, initialise as the mean of trained adapters.

📄 Recent paper: Li et al. (NeurIPS 2024) — Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion (arXiv:2403.07721). Contrastive CLIP alignment with per-subject adapters generalises cross-subject without re-training from scratch. Subject-specific layers only need to capture individual sensor topology offsets.

class SubjectAdapter(nn.Module):
    def __init__(self, d_in, d_out, n_subjects):
        super().__init__()
        self.shared  = nn.Linear(d_in, 512)
        self.adapters = nn.ModuleList([nn.Linear(512, d_out) for _ in range(n_subjects)])
 
    def forward(self, x, subject_id):
        h = F.relu(self.shared(x))
        return self.adapters[subject_id](h)
 
# Initialise subject-60 adapter as mean of trained adapters
s60_w = torch.stack([a.weight.data for a in model.adapters]).mean(0)
s60_b = torch.stack([a.bias.data   for a in model.adapters]).mean(0)

Idea 6 — Contrastive CLIP Alignment

Train the encoding model to map MEG → CLIP image embedding space using contrastive loss, rather than predicting raw sensor amplitudes with MSE.

📄 Recent paper: Ferrante et al. (ICLR 2024 Workshop / 2025 journal) — Towards Neural Foundation Models for Vision (arXiv:2411.09723). CLIP-anchored contrastive alignment of EEG/MEG into shared semantic space enables encoding, decoding, and cross-modal conversion with lightweight modality-specific heads. Freeze CLIP; train only the neural projection.

📄 Recent paper: Piskovskyi et al. (ECCV 2024 Workshops, arXiv:2410.04497) — CLIP-based models generalise better to novel semantic categories in Algonauts 2023 fMRI challenge than task-supervised CNNs.

Why: Predicting relative CLIP similarity is easier than predicting absolute MEG amplitudes, and the shared embedding space is the natural domain for subject generalisation.

loss_fn = nn.CosineEmbeddingLoss()
 
# meg_proj: (N, 512) — MEG features projected into CLIP space
# clip_feats: (N, 512) — frozen CLIP image embeddings
loss = loss_fn(meg_proj, clip_feats, torch.ones(N, device=device))

Notes on OOD Generalisation

📄 Madan et al. (NeurIPS 2024, arXiv:2406.16935) — Benchmarking OOD Generalisation Capabilities of DNN Encoding Models for the Ventral Visual Cortex. Even low-level distribution shifts (hue, contrast) cause models to retain as little as 20% of in-distribution performance. For fixation crops from natural scenes, preprocessing consistency matters.

Practical fix: Normalise fixation crops consistently (ImageNet mean/std), add mild augmentation (colour jitter, contrast norm) during training.

References

Benchetrit, Banville & King (2024). Brain decoding: Toward real-time reconstruction of visual perception. ICLR 2024. arXiv:2310.19812
Ferrante et al. (2024/2025). Towards neural foundation models for vision. arXiv:2411.09723
Elmoznino et al. (2024). High-performing neural network models benefit from high latent dimensionality. PLOS Comp. Bio. doi:10.1371/journal.pcbi.1011792
Madan et al. (2024). Benchmarking OOD generalisation of DNN encoding models. NeurIPS 2024. arXiv:2406.16935
Fakche, Hickey & Jensen (2024). Fast feature- and category-related parafoveal previewing. J. Neuroscience.
Li et al. (2024). Visual decoding and reconstruction via EEG embeddings with guided diffusion. NeurIPS 2024. arXiv:2403.07721
Ahmadi, Bellec & Glatard (2024). Scaling up ridge regression for brain encoding. arXiv:2403.19421
Piskovskyi et al. (2024). Generalizability analysis of DL predictions of brain responses. ECCV 2024 Workshops. arXiv:2410.04497

Neural Encoding with Deep Neural Networks
Tags: neuroscience neural-encoding meg research pytorch

Brain Online

Explorer

BrainEncoding26 – Research Ideas & Implementation

BrainEncoding26 – Research Ideas & Implementation

Priority Overview

Idea 1 — CLIP / DINOv2 as Feature Backbone

Idea 2 — Multi-Layer Feature Concatenation

Idea 3 — Dual-Stream Foveal + Parafoveal Encoding

Idea 4 — MLP Regression Head

Idea 5 — Subject-Specific Adapter Layers

Idea 6 — Contrastive CLIP Alignment

Notes on OOD Generalisation

References

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis