ML for NLP

Outline

Research Question

How well does sentence-level surprisal—computed from a pretrained GPT-2 Small model—predict human self-paced reading times?

suggested structures ML NLP

1.1 Introduction

  • Motivation: Understanding real-time language processing in humans is a core goal of psycholinguistics and cognitive modeling.

  • Surprisal Theory: Word surprisal (–log P(word | context)) has been shown to correlate with reading times  .

  • Proposal: Use GPT-2 Small to compute average sentence surprisal and test its correlation with human reading times.

1.2 Dataset

  • Source: A subset of the Dundee Corpus (50 sentences) with published self-paced reading times (by word).

  • Preparation:

    • Aggregate to sentence-level reading time (mean of word-by-word times).

    • Clean punctuation; ensure sentences ≤50 tokens.

1.3 Model & Metric

  • Model: HuggingFace gpt2 (124 M parameters)

  • Surprisal Computation:

    1. Tokenize sentence → input IDs.

    2. Compute model’s per-token negative log-likelihood (NLL).

    3. Average over tokens → mean surprisal.

  • Evaluation Metric: Spearman’s rank correlation (ρ) between surprisal and reading time  .

1.4 Experimental Protocol

  1. Compute surprisal for each of the 50 sentences.

  2. Compute ρ and p-value via scipy.stats.spearmanr.

  3. Baseline: Compare against a simple sentence length predictor (mean words per sentence).

1.5 Expected Contributions

  • Demonstrate out-of-the-box GPT-2 surprisal correlates with human reading times.

  • Quantify whether contextual embeddings offer a clear gain over length alone.

  • Provide a fully reproducible pipeline (code + data splits).


2) Paper Review:

Smith, N. J., & Levy, R. (2013). The effect of word predictability on reading time is logarithmic. Cognition, 128(3), 302–319. https://doi.org/10.1016/j.cognition.2013.02.013

2.1 Summary

  • Core idea: Human sentence comprehension can be modeled as Bayesian inference under a noisy-channel framework, where readers recover intended sentences from possibly garbled input.

  • Key contribution: Formalizes surprisal calculation within a probabilistic model that accounts for both comprehension noise and prior expectations.

  • Results: Shows predicted reading times from surprisal estimates align with psycholinguistic data.

2.2 Strengths & Weaknesses

  • Strengths:

    • First to ground surprisal in a full Bayesian noisy-channel.

    • Clear derivation linking word probabilities to processing cost.

    • Strong empirical fit to multiple reading-time datasets.

  • Weaknesses:

    • Assumes input noise distributions that are hard to estimate in practice.

    • Uses simple n-gram probabilities; modern LMs could offer more accurate surprisals.

    • No code release—reproducibility relies on reimplementing model.

2.3 Clarity

  • Well-structured: background, model, experiments, discussion.

  • Mathematical notation is concise but may challenge readers without Bayesian background.

2.4 Significance & Impact

  • Laid the theoretical foundation for surprisal theory in psycholinguistics.

  • Influenced decades of work linking probabilistic models to human reading behavior.

2.5 Originality

  • Novel application of noisy-channel inference to real-time language processing.

2.6 Soundness

  • Rational derivation; model fits data across multiple corpora.

  • Lacks contemporary statistical rigor (no held-out test).

2.7 Replicability

  • Full model specification is in the paper, but no open code.

  • Requires gathering the same reading-time datasets and training n-gram LMs.

2.8 Open Questions

  1. Would GPT-2 surprisals improve on n-gram surprisal in predicting reading times?

  2. How does noise modeling (e.g. OCR errors vs. ideal input) affect predictions?

  3. Can surprisal theory extend to eye-tracking measures (e.g. regression probabilities)?


3) Hands-On Analysis: GPT-2 Surprisal vs. Sentence Length Baseline

3.1 Environment Setup

  • Dependencies: transformers, torch, pandas, scipy, matplotlib

  • Time investment: ~1 hour to install & run.

3.2 Data & Preprocessing

  • Load CSV: columns sentence, reading_time (mean ms).

  • Filter out sentences >50 tokens.

  • Lowercase & strip whitespace.

3.3 Surprisal Computation

from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
model.eval()

def surprisal(text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs, labels=inputs["input_ids"])
    return outputs.loss.item()  # avg NLL

3.4 Baseline Computation

df["length"] = df["sentence"].apply(lambda s: len(tokenizer.tokenize(s)))

3.5 Correlation Analysis

from scipy.stats import spearmanr

rho_surp, p_surp = spearmanr(df["surprisal"], df["reading_time"])
rho_len,  p_len  = spearmanr(df["length"],    df["reading_time"])
print(f"GPT-2 Surprisal ρ={rho_surp:.2f}, p={p_surp:.3f}")
print(f"Sentence Length ρ={rho_len:.2f}, p={p_len:.3f}")

3.6 (Optional) Visualization

import matplotlib.pyplot as plt

plt.scatter(df["surprisal"], df["reading_time"], label="Surprisal")
plt.scatter(df["length"],    df["reading_time"], label="Length")
plt.xlabel("Predictor")
plt.ylabel("Reading Time (ms)")
plt.legend()
plt.show()

3.7 Results & Analysis (3–4 pp)

  • Report: Spearman correlations & p-values for surprisal vs. length.

  • Interpretation: Does GPT-2 surprisal significantly outperform simple length?

  • Link back: Connect performance gap to the theoretical motivation in Section 1 and the noisy-channel framework in Section 2.


By following this outline, you’ll produce a coherent, interlinked essay that (1) proposes and justifies your experiment, (2) critically situates it within foundational theory, and (3) delivers real data analysis—all within five days.

Presentation

Tips for presentation

shouldn’t use simply the word performance

paper review deep learning

See also

Status:
Tags: science
Superlink: 611 📠Machine Learning
610 🤖Artificial Intelligence, Künstliche Intelligenz

Quellen

Erstellt: 12-03-25 13:50