The Failure-Mode Taxonomy
A four-category classification of the mechanisms by which research data is lost, organized by the operational origin of the triggering event. The taxonomy is the analytical frame of §3 of the paper and the empirical input to the architectural argument.
3.1 — Personnel turnover and institutional memory. Departing graduate students, postdocs, and PIs carry operational knowledge of data location, format, and provenance with them. The median time from grad school start to PhD is 7.3 years; the median postdoc is ~4.5 years; ~15-23% of postdocs secure tenure-track. The person who understands a dataset is always within a few years of leaving the institution that holds it. Bus-factor analyses show 65% of popular GitHub projects have a truck factor ≤ 2 (S-0027); HLRS at Stuttgart found 57 of 262 user accounts de-registered, leaving ~619 TB of orphaned data (S-0028).
3.2 — Physical and technical loss. Hardware fails, buildings burn, laptops are stolen, software updates collide with backup scripts. Each is routine until it touches single-copy data. Examples: Kyoto University's December 2021 backup-script failure deleting 77 TB across 14 research groups, with 4 groups losing the only copies (S-0029); Brazil's 2018 National Museum fire destroying ~18.4M of 20M items (S-0030).
3.3 — Funding termination. Grants keep data alive; when grants end, maintenance ends. Between February and August 2025, NIH terminated 2,291 active grants ($2.45B), NSF terminated 1,752 grants (~$1.4B); FY2026 proposed cuts of ~56% to NSF, ~24% to NOAA, ~57% to ARPA-E (S-0031, S-0032, S-0033, S-0034).
3.4 — Platform discontinuation and access restriction. 191 research data repositories shut down 2012–2023 at median age 12, 47% with no migration (S-0005). Access can be restricted without closure: Twitter API (S-0038), GISAID (S-0039), CKNI cross-border (S-0040), CERN Russia (S-0041), UK Biobank pricing (S-0042). Ownership transitions: Bepress acquisition (S-0043), Mendeley shutdown (S-0044), Academia.edu paywall (S-0045).
3.5 — Shared structural property. Each category is absorbed without permanent loss when independent copies exist across independent failure domains. The single point of failure varies in form (hardware, organization, funding, jurisdiction); the structural property is constant.
M-0005 is used by every Claim in §3 that reduces a documented loss event to single-copy architecture.