ML4CCN VL10 auditory cortex and semantics
Where does attention in transformers come from?
in semantics it’s the best to push objects as far apart from each other as possible


25 verbs:
“see,” “hear,” “listen,” “taste,” “smell,” “eat,” “touch,” “rub,” “lift,” “manipulate,” “run,” “push,” “fill,” “move,” “ride,” “say,” “fear,” “open,” “approach,” “near,” “enter,” “drive,” “wear,” “break,” and “clean.”
Mitchell 2008
Huth 2016
celery, airplane and apple
- record brain data while people read/hear
- train linguistic models on large scale text corpora
- use transcripts of speech from fMRI to derive predictions from the language model
- evaluate prediction using multivariate variance explained


“the animal didn’t cross the street because it was too tired.”
GPT: predict next word (unidirectional attention)
BERT: predict missing word from surrounding context (bidirectional)

you can do this with images as well.
at every time step attention will put a context vector into the first calculation.


Xu et al (2015), “Show, Attend and Tell”, ICML

Schrimpf et al. 2021
only a model predicting the next word was good at predicting the brain
see also
Type:
Tags:
Status:
Location:
Created: 27-01-25 15:16