Getting Started
A practical quick-start reference — install bencher, write your first benchmark, and learn the core patterns. For a tour of all features (repeats, over-time tracking, optimization), see the Feature Guide.
Install: pip install holobench
Quick Start
import bencher as bn
class MyBenchmark(bn.ParametrizedSweep):
# Inputs — bencher sweeps the Cartesian product of these
size = bn.IntSweep(default=10, bounds=(10, 1000), doc="Problem size")
method = bn.StringSweep(["brute", "optimized"], doc="Algorithm")
# Results — what the benchmark measures
elapsed = bn.ResultFloat(units="s")
def benchmark(self):
self.elapsed = run_benchmark(self.size, self.method)
def example_benchmark(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
bench = bn.Bench("my_bench", MyBenchmark(), run_cfg=run_cfg)
bench.result_vars = ["elapsed"]
bench.plot_sweep("Benchmark", input_vars=["size", "method"])
return bench
if __name__ == "__main__":
bn.run(example_benchmark)
This produces an interactive HTML report with the appropriate plot type auto-selected based on the parameter and result types.
Notice the three stages in the code above:
Problem Definition — the
MyBenchmarkclass declares inputs, results, and thebenchmark()methodSweep Definition —
plot_sweep()selects which parameters to vary and which results to collectRun Definition —
bn.run()sets sampling density (subsampling_divisions),repeats, and output options
Every bencher example follows this pattern. See Architecture Overview for a diagram and deeper explanation.
Core Concept: Dimensions Are Sweep Variables
Every independent parameter that you want to vary must be its own sweep variable. Bencher computes the Cartesian product automatically. Never manually loop over combinations.
class Good(bn.ParametrizedSweep):
width = bn.IntSweep(default=64, bounds=(32, 256))
use_cache = bn.BoolSweep(default=False)
backend = bn.StringSweep(["cpu", "gpu"])
# 3 independent dimensions → bencher sweeps all combinations
Sweep Types
Choose the type that matches the parameter’s nature:
Type |
Use for |
Example |
|---|---|---|
|
Integer ranges |
|
|
Float ranges |
|
|
On/off toggles |
|
|
Categorical choices |
|
|
Python enums |
|
Critical rule: If two things vary independently, they must be separate variables.
Wrong — one variable encoding combinations:
config = bn.StringSweep(["no_cache_cpu", "no_cache_gpu", "cache_cpu", "cache_gpu"])
Right — two independent dimensions:
use_cache = bn.BoolSweep(default=False)
backend = bn.StringSweep(["cpu", "gpu"])
Use IntSweep(bounds=(0, N)) when 0 means “feature absent” and 1+ controls magnitude
(e.g., number of retries, repeat count, number of threads). See the
Sampling Strategies gallery for examples of how different
sweep types produce different sample distributions.
The Subsampling Divisions System
Instead of specifying samples on each sweep variable, you can use the subsampling_divisions
parameter to control sampling density globally with a single knob:
Subsampling Divisions |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|
Samples per dimension |
1 |
2 |
3 |
5 |
9 |
17 |
33 |
Higher subsampling_divisions values reuse all lower samples (binary subdivision), so cached results carry over automatically. Start low for quick iteration, increase for publication quality:
# Quick check — 2 samples per dimension
bn.run(example_benchmark, subsampling_divisions=2)
# Publication quality — 9 samples per dimension
bn.run(example_benchmark, subsampling_divisions=5)
See Concepts: The Subsampling Divisions System for the full formula and theory, and the Subsampling Divisions System gallery for an interactive demo.
Result Types
Type |
Use for |
Set to |
|---|---|---|
|
Continuous scalar metrics (time, distance, score) |
|
|
Success/failure, pass/fail, any binary outcome |
|
|
Text outputs, labels, error messages |
|
|
Images, GIFs |
|
|
Videos |
|
|
Downloadable file outputs |
|
|
Embeddable HTML/panel content |
|
|
Fixed-size vector results (x, y, z) |
|
Choosing between ResultFloat and ResultBool: If a result is binary (success/failure,
reachable/unreachable, pass/fail), always use ResultBool — it locks bounds to [0, 1]
and produces correct boolean-style plots. Only use ResultFloat for continuous metrics.
See the Result Types gallery for examples of each type.
For images: use bn.gen_image_path("name") to generate unique paths.
For videos: use bn.VideoWriter() to collect frames and .write() to save.
See the ResultImage gallery and
ResultVideo gallery for working examples.
Running a Sweep
def example_foo(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
bench = bn.Bench("name", MyBenchmark(), run_cfg=run_cfg)
bench.result_vars = ["elapsed", "accuracy"]
# Single sweep over all dimensions — produces a complete grid
bench.plot_sweep(
"Full Sweep",
input_vars=["size", "method", "backend"],
)
return bench
if __name__ == "__main__":
bn.run(example_foo)
Prefer one plot_sweep with all input vars to get a complete grid.
Controlling Which Values Are Swept
Use bn.sweep() inside input_vars to control the range without changing the
variable definition:
bench.plot_sweep(
"Sweep",
input_vars=[
"size", # full range from bounds
bn.sweep("method", ["fast", "accurate"]), # explicit subset
bn.sweep("workers", max_subsampling_divisions=3), # auto-pick up to 3 values
],
)
Fixing Dimensions with const_vars
To hold some parameters constant while sweeping others:
bench.plot_sweep(
"CPU only",
input_vars=["size", "method"],
const_vars=dict(backend="cpu"),
)
See the Constant Variables gallery for examples of slicing, comparing, and pinning parameters.
Run Configuration
BenchRunCfg has many options, but you rarely need more than a few:
Parameter |
Default |
What it does |
|---|---|---|
|
0 |
Sampling density per dimension (see Subsampling Divisions System above) |
|
1 |
How many times to evaluate each combination |
|
False |
Cache individual results across runs (resume interrupted sweeps) |
|
False |
Cache the entire sweep result (skip re-runs with same inputs) |
|
False |
Track results across multiple runs for time-series analysis |
|
False |
Skip opening a browser to display results |
|
False |
Log the sweep grid summary without executing the benchmark |
All other parameters have sensible defaults. See BenchRunCfg’s docstring for the
full reference.
def example_foo(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
run_cfg.cache_results = False # disable for file-based / non-deterministic results
bench = bn.Bench("name", MyBenchmark(), run_cfg=run_cfg)
...
return bench
if __name__ == "__main__":
bn.run(example_foo, subsampling_divisions=4) # subsampling_divisions controls sweep detail depth
The benchmark() Method
Every benchmark class inherits from bn.ParametrizedSweep and implements benchmark():
class MyBench(bn.ParametrizedSweep):
x = bn.FloatSweep(bounds=(0, 1))
result = bn.ResultFloat()
def benchmark(self):
self.result = compute(self.x)
When benchmark() is called, all sweep parameters (self.x, etc.) are already set.
Just set result variables directly on self. No boilerplate required.
Migration from
__call__: The old pattern of overriding__call__()withself.update_params_from_kwargs(**kwargs)andreturn super().__call__()is deprecated. Simply rename__call__tobenchmark, remove the two boilerplate lines, and remove**kwargsfrom the signature.
File-Based Results (Images, Videos)
When producing files:
Write to a unique path per combination (use parameter values in the path)
Set
run_cfg.cache_results = FalseUse
bn.ResultImage()/bn.ResultVideo()and set to the path string
class ImageBench(bn.ParametrizedSweep):
width = bn.IntSweep(bounds=(100, 500))
output = bn.ResultImage()
def benchmark(self):
path = bn.gen_image_path(f"output_{self.width}")
render_image(self.width, path)
self.output = str(path)
Entry Point Convention
Function name must start with
example_(used for discovery by tests and docs)Accept
run_cfg: bn.BenchRunCfg | None = NoneReturn the
bn.BenchinstanceUse
bn.run(example_func)in__main__
See the Workflows gallery for complete examples showing this convention in action, including multi-sweep and BenchRunner patterns.
Aggregating Dimensions
When sweeping many dimensions, the visualizations can become unwieldy. Use the
aggregate parameter on plot_sweep() to collapse dimensions into summary
statistics (mean, std, etc.):
bench.plot_sweep(
"Aggregated view",
input_vars=["x", "y", "method"],
result_vars=["elapsed"],
aggregate=True, # collapse all dimensions except the first
# aggregate=2, # collapse the last 2 dimensions
# aggregate=["method"], # collapse only the "method" dimension
agg_fn="mean", # aggregation function: mean, sum, max, min, median
)
aggregate=True— collapse all dimensions except the first into a single aggregated statisticaggregate=N(int) — collapse the last N dimensionsaggregate=["var1", "var2"]— collapse only the named dimensions
See the Aggregation gallery for examples of each mode.
Machine-Readable Results (Agents & CI)
Bencher already computes per-metric verdicts, optimal values, and regression deltas during collection. To consume them programmatically — from an agent, a CI gate, or another script — export them as JSON instead of scraping the HTML report or logs.
import bencher as bn
res = bench.collect(input_vars=[...], result_vars=[...], run_cfg=run_cfg)
# A single run -> result.json
bn.result_to_dict(res) # dict: schema_version, metrics, regressions, provenance
bn.result_to_json(res, "result.json")
# A/B between two independently collected results -> comparison.json
cmp = bn.compare_results(baseline_res, candidate_res) # per-metric verdict + summary counts
bn.comparison_to_json(baseline_res, candidate_res, "comparison.json")
compare_results runs the same regression detector used by the over-time path (a percentage
comparison by default), so each metric’s verdict is one of improved / regressed /
unchanged using identical direction/threshold semantics. Pass run_cfg= to choose a
different regression_method.
The same artifacts are available from the CLI on a saved result (see the collect/render split):
# render HTML and also emit result.json
python -m bencher.render result.pkl out_dir --json result.json
# diff two saved results
python -m bencher.render compare baseline.pkl candidate.pkl --json comparison.json
BenchReport.save(..., emit_json=True) writes result.json next to the HTML for every
contained result (opt-in; default off). All JSON output is strict — non-finite values (e.g. a
zero-baseline percent change) are emitted as null.
Common Mistakes
Mistake |
Fix |
|---|---|
Manually looping over parameter combinations |
Use |
One StringSweep encoding multiple independent toggles |
Use separate BoolSweep / IntSweep per toggle |
Many small plot_sweep calls for different combos |
One plot_sweep with all input_vars |
Building panel/HTML layouts manually |
Use bencher’s report system |
Using the old |
Override |
Caching file-path results |
Set |
Using |
Use |