Benchmark track

Reproducibility

Are results stable across batches and labs? Within-lab and cross-lab coefficient of variation on headline metric.

Systems evaluated
0
Benchmark runs
0
Metrics
3
Tasks
1

Why this track exists

Per-track scores without reproducibility data hide cross-batch variance. This track is a first-class citizen, not a footnote.

Leaderboard

Only rows with run status published, provisional, or scored appear.

No verified entries yet
No entries on this track yet
Reviewed benchmark runs will appear here.

Metric definitions

MetricDescriptionDirectionUnit
Within-lab CVCoefficient of variation across batches within a single lab.lower_bettercv
Cross-lab CVCoefficient of variation across labs running the same protocol.lower_bettercv
Independent replicationsNumber of independent labs that have replicated the result.higher_bettercount

Scoring formula

1 minus normalized coefficient of variation across batches (within-lab) and across labs (cross-lab). Both must be reported for a full score.
Current methodology