What does research data loss cost institutions and science?
If 73 to 93 percent of published research sits on data that cannot be produced on request, the cost of that condition operates across at least three dimensions: the institutional liability carried on each non-verifiable dataset, the cost to scientific progress when reproducibility and reuse are foreclosed, and the compounding loss of downstream research the destroyed data would have enabled.
This question asks how to quantify each of those dimensions, what mechanisms convert latent liability into realized cost, and what an honest accounting of the carrying cost looks like at institutional and sectoral scale.
Subsidiary questions:
- What is the institutional liability carried on a single dataset that cannot be produced on request?
- What is that liability summed across an institution's annual publication output?
- Through what mechanisms does latent liability convert to realized loss — audit, retraction, FCA action, funder verification, compliance check?
- How does the cost to science as an enterprise (reproducibility, reference rot, foregone reuse) compound across decades of single-copy architecture?
The Four-Term Liability Formula (M-0003) is the analytical instrument used to answer the institutional half of the question. Reproducibility and structural-decay measurements answer the scientific half.