Resilient Data Futures
EvidenceE-0114draft

Piwowar & Vision 2013 — 9% open-data citation advantage; 150 reuse papers per 100 deposited datasets in 5 years

§7.5, §92026-05-034 out · 0 in

Piwowar & Vision's 2013 PeerJ analysis ("Data reuse and the open data citation advantage") established two of the most cited empirical anchors for the open-data return argument (S-0055):

  • Papers with publicly available data receive 9% more citations, controlled for journal impact factor, author publication history, and institutional citation history.
  • Every 100 deposited datasets generate over 150 reuse papers within five years.

The two findings are typically cited together; both are direct measurements of the downstream value generated by data deposit relative to non-deposit. The 9% figure is the conservative end of the open-data citation advantage range — Colavizza et al. 2020 (E-0110) measured up to 25.36% on a different corpus and methodology — and the 150-reuse-papers figure quantifies the multiplicative scientific output that deposit produces beyond the original citation tail.

These figures are the empirical input to M-0003's Term C ("downstream value lost"), used in C-0005's representative-R1 application and C-0024's Agh humanities case to estimate per-dataset C as a function of the institutional publication-and-deposit profile. In §7.5 the same figures support C-0031 (positive ROI on research data infrastructure across all documented studies), demonstrating that even the conservative open-data citation advantage produces measurable institutional return on top of the program-level ROI documented at EMBL-EBI, NCRIS, XSEDE, and similar.