Data Quality, Archival Failure, and the Making of a Parliamentary Record

What building a parliamentary database revealed about error in Hansard

Alfie Chadwick

Monash University

Libby Lester

Monash University

Simon D.Angus

Monash University

2026-05-20

The Claim

“Hansard is a substantially verbatim and complete transcript of proceedings in the Australian Parliament.”

  • Complete and official record since Federation, 1901
  • Used by journalists, historians, political scientists
  • Increasingly: computational social scientists doing large-scale text analysis

Three kinds of error in the record

  • Transcription error
    What gets recorded, edited, omitted, standardised

  • Digitisation error
    What gets malformed, misattributed, miscoded, or lost in machine-readable form

  • Interpretive error
    What researchers falsely assume the record can tell us

Transcription is not neutral capture

  • Hansard is produced by staff, editors, and institutional rules
  • speech is cleaned, regularised, and sometimes omitted
  • interjections are selectively recorded

Dimensional Collapse




‘The Treasurer did start the inflation fire. The inflation is burning while the Treasurer is squirming. The Treasurer did start inflation fire. Yes, he poured debt petrol on it, and he cashed organised crime to fuel it.’

House of Representatives Hansard, 4 March 2026, p. 45

Findable errors in the archive

  • malformed XML
  • incorrect speaker IDs
  • speeches attributed to people not serving that day
  • date errors
  • unpaired questions and answers
  • inconsistent service records across sources

Unknown errors in the archive

Some errors are detectable. Others are not.

We can’t reliably see:

  • plausible but wrong speaker attributions
  • omitted interjections
  • shifting editorial standards over time
  • inconsistencies that leave no trace of their correction

What Hansard cannot capture

  • the backstage of parliament
  • party discipline and pre-scripted performance
  • media strategy
  • public circulation and afterlife
  • embodiment, atmosphere, consequence

Why this matters for longitudinal analysis

If the record changes over time, then apparent historical trends may reflect:

  • changes in parliament
  • changes in transcription practices
  • changes in digitisation quality
  • or changes in editorial judgement

Example: are women interrupted less?

  • more respect?
  • less engagement?
  • less important speeches?
  • less important roles?
  • selective editorial capture?
  • differences in speaking context?

Rudd vs Gillard