ML for NLP

Outline

Research Question

How well does sentence-level surprisal—computed from a pretrained GPT-2 Small model—predict human self-paced reading times?

suggested structures ML NLP

1.1 Introduction

Motivation: Understanding real-time language processing in humans is a core goal of psycholinguistics and cognitive modeling.
Surprisal Theory: Word surprisal (–log P(word | context)) has been shown to correlate with reading times .
Proposal: Use GPT-2 Small to compute average sentence surprisal and test its correlation with human reading times.

1.2 Dataset

Source: A subset of the Dundee Corpus (50 sentences) with published self-paced reading times (by word).
Preparation:
- Aggregate to sentence-level reading time (mean of word-by-word times).
- Clean punctuation; ensure sentences ≤50 tokens.

1.3 Model & Metric

Model: HuggingFace gpt2 (124 M parameters)
Surprisal Computation:
1. Tokenize sentence → input IDs.
2. Compute model’s per-token negative log-likelihood (NLL).
3. Average over tokens → mean surprisal.
Evaluation Metric: Spearman’s rank correlation (ρ) between surprisal and reading time .

1.4 Experimental Protocol

Compute surprisal for each of the 50 sentences.
Compute ρ and p-value via scipy.stats.spearmanr.
Baseline: Compare against a simple sentence length predictor (mean words per sentence).

1.5 Expected Contributions

Demonstrate out-of-the-box GPT-2 surprisal correlates with human reading times.
Quantify whether contextual embeddings offer a clear gain over length alone.
Provide a fully reproducible pipeline (code + data splits).

2) Paper Review:

Smith, N. J., & Levy, R. (2013). The effect of word predictability on reading time is logarithmic. Cognition, 128(3), 302–319. https://doi.org/10.1016/j.cognition.2013.02.013

2.1 Summary

Core idea: Human sentence comprehension can be modeled as Bayesian inference under a noisy-channel framework, where readers recover intended sentences from possibly garbled input.
Key contribution: Formalizes surprisal calculation within a probabilistic model that accounts for both comprehension noise and prior expectations.
Results: Shows predicted reading times from surprisal estimates align with psycholinguistic data.

2.2 Strengths & Weaknesses

Strengths:
- First to ground surprisal in a full Bayesian noisy-channel.
- Clear derivation linking word probabilities to processing cost.
- Strong empirical fit to multiple reading-time datasets.
Weaknesses:
- Assumes input noise distributions that are hard to estimate in practice.
- Uses simple n-gram probabilities; modern LMs could offer more accurate surprisals.
- No code release—reproducibility relies on reimplementing model.

2.3 Clarity

Well-structured: background, model, experiments, discussion.
Mathematical notation is concise but may challenge readers without Bayesian background.

2.4 Significance & Impact

Laid the theoretical foundation for surprisal theory in psycholinguistics.
Influenced decades of work linking probabilistic models to human reading behavior.

2.5 Originality

Novel application of noisy-channel inference to real-time language processing.

2.6 Soundness

Rational derivation; model fits data across multiple corpora.
Lacks contemporary statistical rigor (no held-out test).

2.7 Replicability

Full model specification is in the paper, but no open code.
Requires gathering the same reading-time datasets and training n-gram LMs.

2.8 Open Questions

Would GPT-2 surprisals improve on n-gram surprisal in predicting reading times?
How does noise modeling (e.g. OCR errors vs. ideal input) affect predictions?
Can surprisal theory extend to eye-tracking measures (e.g. regression probabilities)?

3) Hands-On Analysis: GPT-2 Surprisal vs. Sentence Length Baseline

3.1 Environment Setup

Dependencies: transformers, torch, pandas, scipy, matplotlib
Time investment: ~1 hour to install & run.

3.2 Data & Preprocessing

Load CSV: columns sentence, reading_time (mean ms).
Filter out sentences >50 tokens.
Lowercase & strip whitespace.

3.3 Surprisal Computation

from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
model.eval()

def surprisal(text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs, labels=inputs["input_ids"])
    return outputs.loss.item()  # avg NLL

3.4 Baseline Computation

df["length"] = df["sentence"].apply(lambda s: len(tokenizer.tokenize(s)))

3.5 Correlation Analysis

from scipy.stats import spearmanr

rho_surp, p_surp = spearmanr(df["surprisal"], df["reading_time"])
rho_len,  p_len  = spearmanr(df["length"],    df["reading_time"])
print(f"GPT-2 Surprisal ρ={rho_surp:.2f}, p={p_surp:.3f}")
print(f"Sentence Length ρ={rho_len:.2f}, p={p_len:.3f}")

3.6 (Optional) Visualization

import matplotlib.pyplot as plt

plt.scatter(df["surprisal"], df["reading_time"], label="Surprisal")
plt.scatter(df["length"],    df["reading_time"], label="Length")
plt.xlabel("Predictor")
plt.ylabel("Reading Time (ms)")
plt.legend()
plt.show()

3.7 Results & Analysis (3–4 pp)

Report: Spearman correlations & p-values for surprisal vs. length.
Interpretation: Does GPT-2 surprisal significantly outperform simple length?
Link back: Connect performance gap to the theoretical motivation in Section 1 and the noisy-channel framework in Section 2.

By following this outline, you’ll produce a coherent, interlinked essay that (1) proposes and justifies your experiment, (2) critically situates it within foundational theory, and (3) delivers real data analysis—all within five days.

Presentation

Tips for presentation

shouldn’t use simply the word performance

paper review deep learning

Quellen

Erstellt: 12-03-25 13:50

Brain Online

Explorer

ML for NLP

ML for NLP

Outline

1.1 Introduction

1.2 Dataset

1.3 Model & Metric

1.4 Experimental Protocol

1.5 Expected Contributions

2) Paper Review:

2.1 Summary

2.2 Strengths & Weaknesses

2.3 Clarity

2.4 Significance & Impact

2.5 Originality

2.6 Soundness

2.7 Replicability

2.8 Open Questions

3) Hands-On Analysis: GPT-2 Surprisal vs. Sentence Length Baseline

3.1 Environment Setup

3.2 Data & Preprocessing

3.3 Surprisal Computation

3.4 Baseline Computation

3.5 Correlation Analysis

3.6 (Optional) Visualization

3.7 Results & Analysis (3–4 pp)

Presentation

Tips for presentation

See also

Quellen

Backlinks

Mika

✨ Features

⚙️ Einstellungen

📚 Chat-Verlauf

📖 Citation Manager

✍️ Writing Assistant

Inhaltsverzeichnis