Header‑only benchmarking and validation for C++20 code.
ComPPare lets you time and cross‑check any number of host‑side implementations—CPU, OpenMP, CUDA, SYCL, TBB, etc.—that share the same inputs and outputs. It is intended for developers who are porting functions into new framework or hardware, allowing for standalone development and testing.
Purpose
- Performance comparison: Measure the total call time, an inner region‑of‑interest (ROI) that you define, and the residual overhead (setup, transfers).
- Validate results: Report maximum, mean and total absolute error versus a designated reference implementation and flag discrepancies.
- Streamline porting: Run the same data set through multiple versions of a function.
Key capabilities
Capability | Description |
Header‑only | Copy headers, or include it, and compile with C++20 or newer. |
Any host backend | Accepts any function pointer that runs on the host. |
Detailed timing | Separates overall call time, your ROI time, and setup/transfer overhead. |
Built-in error comparison | For common data types, automatically choose the correct method and compares against reference function |
Getting Started
See User Guide for more detailed user guide and Examples to see real life examples.
Contributions are welcome! Please see Code Documentation if interested in contributing to this repo.
Install
1. Clone repository
git clone https://github.com/funglf/ComPPare.git --recursive
If submodules like google benchmark/ nvbench is not needed:
git clone https://github.com/funglf/ComPPare.git
2. (Optional) Build Google Benchmark and nvbench
See Google Benchmark Instructions
See nvbench Intructions
3. Include ComPPare
In your C++ code, simply include the comppare header file by:
This file is the main include file for the ComPPare framework.
Quick Start
1. Adopt the required function signature
Function output must be void
and consists of the input, then output types
void impl(const Inputs&... in,
Outputs&... out);
In order to benchmark specific regions of code, following Macros HOTLOOPSTART
, HOTLOOPEND
are needed:
void impl(const Inputs&... in,
Outputs&... out);
{
}
#define HOTLOOPSTART
Macro to mark the start of a hot loop for benchmarking. This macro defines a lambda function hotloop_...
Definition comppare.hpp:964
#define HOTLOOPEND
Definition comppare.hpp:1003
SAXPY function example signatures
void saxpy_cpu(
float a,
const std::vector<float> &x,
const std::vector<float> &y_in,
std::vector<float> &y_out)
void saxpy_gpu(
float a,
const std::vector<float> &x,
const std::vector<float> &y_in,
std::vector<float> &y_out)
2. Create a comparison object
- Describe the output types as template argument
- Pass the input data — constructs framework object with input data that will be reused for every implementation
auto make_comppare(Inputs &&...ins)
Helper function to create a comppare object.
Definition comppare.hpp:953
Note: you can use move semantics here. All inputs are perfectly forwarded. eg. Cmp cmp(a, std::move(x), std::move(y));
3. Register implementations
cmp.set_reference("saxpy reference", saxpy_cpu);
cmp.add("saxpy gpu", saxpy_gpu);
4. Run and inspect
Sample report:
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
============ ComPPare Framework ============
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Number of implementations: 4
Warmup iterations: 100
Benchmark iterations: 100
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
Implementation ROI µs/Iter Func µs Ovhd µs Max|err|[0] Mean|err|[0] Total|err|[0]
saxpy reference 0.28 33.67 5.63 0.00e+00 0.00e+00 0.00e+00
saxpy gpu 10.89 137828.11 136739.02 5.75e+06 2.85e+06 2.92e+09 <-- FAIL
Complete example with SAXPY
(See SAXPY Full Example)
#include <vector>
void saxpy_cpu(
float a,
const std::vector<float> &x,
const std::vector<float> &y_in,
std::vector<float> &y_out)
{
size_t N = x.size();
y_out.resize(N);
for (size_t i = 0; i < N; ++i)
{
y_out[i] = a * x[i] + y_in[i];
}
}
void saxpy_gpu(){...};
int main(int argc, char **argv)
{
float a = 1.1f;
std::vector<float> x(1000, 2.2);
std::vector<float> y(1000, 3.3);
cmp.set_reference("saxpy reference", saxpy_cpu);
cmp.add("saxpy gpu", saxpy_gpu);
cmp.run(argc, argv);
return 0;
}