Resilient Data Futures
ClaimC-0005draft

A representative R1 carries ~$1.1B/year in unverifiable research output as latent liability

Exec Summary, §5.3.2, §5.62026-05-033 out · 13 in

Applying the Four-Term Liability Formula (M-0003) to a Carnegie R1 university running approximately $200M in annual R&D and producing ~3,000 peer-reviewed publications per year, of which ~80% (the C-0002 baseline) carry underlying data that cannot be produced on request:

  • Term A (sunk grant value): ~$574M/year applied at the Stern et al. peer-reviewed median of $239,381 per retracted paper across 2,400 unretrievable papers (rising to ~$942M/year at the mean of $392,582) (S-0050).
  • Term B (replacement cost): ~$360–420M/year for the reconstructible fraction; the irreplaceable remainder (human-subjects studies, decommissioned cohorts, one-time-event datasets) is unbounded above.
  • Term C (downstream value lost): ~$172M/year in foregone reuse value, applying Piwowar & Vision's 150-reuse-papers-per-100-deposited-datasets at the per-paper grant attribution from Term A (S-0055).
  • Term D (FCA exposure): institutional tail risk of $10M-$112.5M per major surfaced event under the precedent stack (Duke $112.5M, Harvard-Anversa $10M, Dana-Farber $15M).

Maximum institutional exposure under a full-enforcement scenario: approximately $1.1 billion per year (~$574M A + ~$360-420M B + ~$172M C). This is the tail — the loss the institution would realize if every unverifiable paper were surfaced within a given year. Expected annual loss is substantially lower in absolute terms in any current year and is a function of the surfacing probability that §5.4 documents as rising across three independent vectors.

The number is order-of-magnitude. The methodology — A + B + C + D applied to the unretrievable fraction — is the contestable substance. The ~$1.1B headline is the visible consequence of the methodology, not an independent estimate.