EvidenceE-0112draft

Piwowar & Vision 2013 — 9% open-data citation advantage; ~150 reuse papers per 100 deposited datasets

§5.2, §5.3.1, §5.3.22026-05-033 out · 0 in

Piwowar and Vision's 2013 PeerJ analysis of a gene-expression microarray corpus documented a 9% citation advantage (95% CI 5–13%) for papers with publicly available data, controlled for journal, author, and institutional history. The same study estimated that every 100 deposited datasets generate over 150 reuse papers within five years (S-0055).

The 9% figure is the conservative anchor of the open-data citation-advantage range; Colavizza et al. 2020 (E-0110) measured the differential at up to ~25% on the PLOS+BMC corpus when data was deposited in a repository rather than offered on request. The two studies bracket the range used in the §5 Term-C calculations.

The 150-per-100 reuse rate is the directional baseline for downstream-value-lost calculations. The §5.3.1 Agh 2009 case (C-0024) applies it to a single paper with judgment-call adjustments for the pre-collapse-baseline character of the dataset, yielding 2–4 reuse papers conservatively estimated. The §5.3.2 representative-R1 application (C-0005) applies it directly to ~2,400 unretrievable papers per year, yielding approximately 3,600 reuse papers foregone over each five-year window — roughly 720 papers per year of downstream research productivity not generated, monetized at the Stern per-paper grant attribution (E-0036) at ~$172M per year.

The original measurement was specific to gene-expression microarray data; the §5 application uses it as a directional baseline rather than a precise estimator across all research types. The figures also inform Term C of the four-term liability formula (M-0003).

Start discussion Inspect bundle