Skip to content

What is a Benchmark?

A scientific benchmark is validation of scientific performance of software, ideally in an automated and continuous manner

Datasets vs. protocols:

  • Benchmarks can be a dataset, i.e. a benchmark set which is a defined, curated input to test a method
  • Benchmarks can also be a protocol, i.e. a benchmarking protocol which is a defined and optimized protocol by which different datasets can be tested for variety, coverage, and “quality” by whatever defined metric

Types of benchmarks

  • Performance benchmarks to gauge speed, efficiency, memory usage
  • Scientific benchmarks to gauge scientific accuracy of a method
  • Usability benchmarks to ensure software is easy to install, use, and well documented
  • Portability benchmarks to gauge portability across different hardware platforms

Considerations: Think about: what aspect do you want to improve about your software? Also think about what the bottleneck is.

Public and private benchmarks

  • Public benchmarks are benchmarks that are publicly available.
    • Requirements: They need to be publicly accessible (no paywall), well documented, have an easy-to-understand user interface, accessible in a widely used format, accessible using widely used software
  • Private benchmarks
    • Requirements: The requirements depend on the stakeholders and constituents.
    • Example: Different companies or partners might have different requirements for what they allow to be publicly accessible vs. closed source.