The Four-Tier Architectural Taxonomy
A classification of preservation architectures by how many copies of a dataset exist, across how many independent failure domains, under what coordination model, and with what verification capability.
Tier 0 — Local storage. A single copy on a single system in a single location. Laboratory server, departmental drive, PI laptop, external hard drive. No replication, no geographic redundancy, no verification beyond the storage medium itself. Default architecture of most research data.
Tier 1 — Hosted storage. A single copy held by an external hosting provider (institutional repository, Zenodo, Dryad, AWS, Google Cloud). Professional management and provider-internal redundancy, but the data exists in one organizational context, subject to one provider's business decisions, funding continuity, terms of service, and jurisdiction.
Tier 2 — Coordinated institutional preservation. Multiple copies across multiple locations, coordinated by institutional agreements, funded by membership fees, libraries, and philanthropic support. Examples: INSDC (3 sites), wwPDB (4 sites), CLOCKSS (12 mirror nodes), WLCG. Resilience depends on the continued coordination of three to four organizations.
Tier 3 — Protocol-level distribution. Data is distributed as a structural byproduct of use across a protocol that requires no organization to operate. Redundancy arises at every point of participation. Examples: DNS, email, BitTorrent, Git, IPFS, ATProto, Matrix, Tor. Persistence is independent of any single organization's continuity.
The taxonomy is the analytical instrument under which every Claim about architecture-induced loss, prevention, cost, and verification in the paper is read. A Claim that a particular failure mode reduces to single-copy concentration uses M-0001 to identify the reduction.
The taxonomy is also a practice property, not just a technology property: a Tier 3 protocol deployed in a centralized configuration delivers Tier 1 resilience. The tier at which a dataset effectively exists is determined by the deployment, not by the software.