Benchmark track
Reproducibility
Are results stable across batches and labs? Within-lab and cross-lab coefficient of variation on headline metric.
Systems evaluated
0
Benchmark runs
0
Metrics
3
Tasks
1
Why this track exists
Per-track scores without reproducibility data hide cross-batch variance. This track is a first-class citizen, not a footnote.
Leaderboard
Only rows with run status published, provisional, or scored appear.
Metric definitions
| Metric | Description | Direction | Unit |
|---|---|---|---|
| Within-lab CV | Coefficient of variation across batches within a single lab. | lower_better | cv |
| Cross-lab CV | Coefficient of variation across labs running the same protocol. | lower_better | cv |
| Independent replications | Number of independent labs that have replicated the result. | higher_better | count |
Scoring formula
1 minus normalized coefficient of variation across batches (within-lab) and across labs (cross-lab). Both must be reported for a full score.
Current methodology