What is a Benchmark?

A scientific benchmark is validation of scientific performance of software, ideally in an automated and continuous manner

Benchmarks can be a dataset, i.e. a benchmark set which is a defined, curated input to test a method
Benchmarks can also be a protocol, i.e. a benchmarking protocol which is a defined and optimized protocol by which different datasets can be tested for variety, coverage, and “quality” by whatever defined metric

Performance benchmarks to gauge speed, efficiency, memory usage
Scientific benchmarks to gauge scientific accuracy of a method
Usability benchmarks to ensure software is easy to install, use, and well documented
Portability benchmarks to gauge portability across different hardware platforms

Considerations: Think about: what aspect do you want to improve about your software? Also think about what the bottleneck is.

Public benchmarks are benchmarks that are publicly available.
- Requirements: They need to be publicly accessible (no paywall), well documented, have an easy-to-understand user interface, accessible in a widely used format, accessible using widely used software
Private benchmarks
- Requirements: The requirements depend on the stakeholders and constituents.
- Example: Different companies or partners might have different requirements for what they allow to be publicly accessible vs. closed source.