What is a Benchmark?
A scientific benchmark is validation of scientific performance of software, ideally in an automated and continuous manner
Datasets vs. protocols:
- Benchmarks can be a dataset, i.e. a benchmark set which is a defined, curated input to test a method
- Benchmarks can also be a protocol, i.e. a benchmarking protocol which is a defined and optimized protocol by which different datasets can be tested for variety, coverage, and “quality” by whatever defined metric
Types of benchmarks
- Performance benchmarks to gauge speed, efficiency, memory usage
- Scientific benchmarks to gauge scientific accuracy of a method
- Usability benchmarks to ensure software is easy to install, use, and well documented
- Portability benchmarks to gauge portability across different hardware platforms
Considerations: Think about: what aspect do you want to improve about your software? Also think about what the bottleneck is.
Public and private benchmarks
- Public benchmarks are benchmarks that are publicly available.
- Requirements: They need to be publicly accessible (no paywall), well documented, have an easy-to-understand user interface, accessible in a widely used format, accessible using widely used software
- Private benchmarks
- Requirements: The requirements depend on the stakeholders and constituents.
- Example: Different companies or partners might have different requirements for what they allow to be publicly accessible vs. closed source.