Resilient Data Futures
EvidenceE-0037draft

74% of published R analysis files fail to execute without error

§4.12026-05-032 out · 0 in

A large-scale analysis of published research code found that 74% of R files fail to complete without error, and 56% still fail after automated cleaning (S-0051).

The figures measure code reproducibility, which is a tighter standard than data reproducibility — code can in principle be re-executed; data must be re-collected. That 74% of published R analysis files cannot be executed without error means the analysis cannot be re-run from the published artifact, even when the underlying data is available.

The 56%-after-cleaning figure shows that the failures are not entirely environmental drift (library versions, R versions, OS-specific quirks). After automated cleaning, a majority still fails — meaning the code as published is not capable of producing the published result.

The case is empirical input for C-0022 at the code layer of the reproducibility crisis. The architectural fix at this layer is content-addressed code (Git, Software Heritage), which the paper develops in §2.5 and §11.6 (R6).