Getting Started

A practical quick-start reference — install bencher, write your first benchmark, and learn the core patterns. For a tour of all features (repeats, over-time tracking, optimization), see the Feature Guide.

Install: pip install holobench

Quick Start

import bencher as bn

class MyBenchmark(bn.ParametrizedSweep):
    # Inputs — bencher sweeps the Cartesian product of these
    size = bn.IntSweep(default=10, bounds=(10, 1000), doc="Problem size")
    method = bn.StringSweep(["brute", "optimized"], doc="Algorithm")

    # Results — what the benchmark measures
    elapsed = bn.ResultFloat(units="s")

    def benchmark(self):
        self.elapsed = run_benchmark(self.size, self.method)

def example_benchmark(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
    bench = bn.Bench("my_bench", MyBenchmark(), run_cfg=run_cfg)
    bench.result_vars = ["elapsed"]
    bench.plot_sweep("Benchmark", input_vars=["size", "method"])
    return bench

if __name__ == "__main__":
    bn.run(example_benchmark)

This produces an interactive HTML report with the appropriate plot type auto-selected based on the parameter and result types.

Notice the three stages in the code above:

Problem Definition — the MyBenchmark class declares inputs, results, and the benchmark() method
Sweep Definition — plot_sweep() selects which parameters to vary and which results to collect
Run Definition — bn.run() sets sampling density (subsampling_divisions), repeats, and output options

Every bencher example follows this pattern. See Architecture Overview for a diagram and deeper explanation.

Core Concept: Dimensions Are Sweep Variables

Every independent parameter that you want to vary must be its own sweep variable. Bencher computes the Cartesian product automatically. Never manually loop over combinations.

class Good(bn.ParametrizedSweep):
    width = bn.IntSweep(default=64, bounds=(32, 256))
    use_cache = bn.BoolSweep(default=False)
    backend = bn.StringSweep(["cpu", "gpu"])
    # 3 independent dimensions → bencher sweeps all combinations

Sweep Types

Choose the type that matches the parameter’s nature:

Type	Use for	Example
`bn.IntSweep(bounds=(lo, hi))`	Integer ranges	`n_workers = bn.IntSweep(bounds=(1, 8))`
`bn.FloatSweep(bounds=(lo, hi))`	Float ranges	`learning_rate = bn.FloatSweep(bounds=(0.001, 0.1))`
`bn.BoolSweep()`	On/off toggles	`use_jit = bn.BoolSweep(default=False)`
`bn.StringSweep([...])`	Categorical choices	`optimizer = bn.StringSweep(["adam", "sgd"])`
`bn.EnumSweep(MyEnum)`	Python enums	`mode = bn.EnumSweep(CompressionMode)`

Critical rule: If two things vary independently, they must be separate variables.

Wrong — one variable encoding combinations:

config = bn.StringSweep(["no_cache_cpu", "no_cache_gpu", "cache_cpu", "cache_gpu"])

Right — two independent dimensions:

use_cache = bn.BoolSweep(default=False)
backend = bn.StringSweep(["cpu", "gpu"])

Use IntSweep(bounds=(0, N)) when 0 means “feature absent” and 1+ controls magnitude (e.g., number of retries, repeat count, number of threads). See the Sampling Strategies gallery for examples of how different sweep types produce different sample distributions.

The Subsampling Divisions System

Instead of specifying samples on each sweep variable, you can use the subsampling_divisions parameter to control sampling density globally with a single knob:

Subsampling Divisions	1	2	3	4	5	6	7
Samples per dimension	1	2	3	5	9	17	33

Higher subsampling_divisions values reuse all lower samples (binary subdivision), so cached results carry over automatically. Start low for quick iteration, increase for publication quality:

# Quick check — 2 samples per dimension
bn.run(example_benchmark, subsampling_divisions=2)

# Publication quality — 9 samples per dimension
bn.run(example_benchmark, subsampling_divisions=5)

See Concepts: The Subsampling Divisions System for the full formula and theory, and the Subsampling Divisions System gallery for an interactive demo.

Result Types

Type	Use for	Set to
`bn.ResultFloat(units="s")`	Continuous scalar metrics (time, distance, score)	`self.elapsed = 0.42`
`bn.ResultBool()`	Success/failure, pass/fail, any binary outcome	`self.success = True`
`bn.ResultString()`	Text outputs, labels, error messages	`self.error_msg = "timeout"`
`bn.ResultImage()`	Images, GIFs	`self.img = "/path/to/output.png"`
`bn.ResultVideo()`	Videos	`self.vid = video_writer.write()`
`bn.ResultPath()`	Downloadable file outputs	`self.artifact = "/path/to/file"`
`bn.ResultContainer()`	Embeddable HTML/panel content	`self.widget = pane`
`bn.ResultVec(size=3)`	Fixed-size vector results (x, y, z)	`self.position = [1.0, 2.0, 3.0]`

Choosing between ResultFloat and ResultBool: If a result is binary (success/failure, reachable/unreachable, pass/fail), always use ResultBool — it locks bounds to [0, 1] and produces correct boolean-style plots. Only use ResultFloat for continuous metrics. See the Result Types gallery for examples of each type.

For images: use bn.gen_image_path("name") to generate unique paths. For videos: use bn.VideoWriter() to collect frames and .write() to save. See the ResultImage gallery and ResultVideo gallery for working examples.

Running a Sweep

def example_foo(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
    bench = bn.Bench("name", MyBenchmark(), run_cfg=run_cfg)
    bench.result_vars = ["elapsed", "accuracy"]

    # Single sweep over all dimensions — produces a complete grid
    bench.plot_sweep(
        "Full Sweep",
        input_vars=["size", "method", "backend"],
    )

    return bench

if __name__ == "__main__":
    bn.run(example_foo)

Prefer one plot_sweep with all input vars to get a complete grid.

Controlling Which Values Are Swept

Use bn.sweep() inside input_vars to control the range without changing the variable definition:

bench.plot_sweep(
    "Sweep",
    input_vars=[
        "size",                                    # full range from bounds
        bn.sweep("method", ["fast", "accurate"]),  # explicit subset
        bn.sweep("workers", max_subsampling_divisions=3),           # auto-pick up to 3 values
    ],
)

Fixing Dimensions with const_vars

To hold some parameters constant while sweeping others:

bench.plot_sweep(
    "CPU only",
    input_vars=["size", "method"],
    const_vars=dict(backend="cpu"),
)

See the Constant Variables gallery for examples of slicing, comparing, and pinning parameters.

Run Configuration

BenchRunCfg has many options, but you rarely need more than a few:

Parameter	Default	What it does
`subsampling_divisions`	0	Sampling density per dimension (see Subsampling Divisions System above)
`repeats`	1	How many times to evaluate each combination
`cache_samples`	False	Cache individual results across runs (resume interrupted sweeps)
`cache_results`	False	Cache the entire sweep result (skip re-runs with same inputs)
`over_time`	False	Track results across multiple runs for time-series analysis
`headless`	False	Skip opening a browser to display results
`dry_run`	False	Log the sweep grid summary without executing the benchmark

All other parameters have sensible defaults. See BenchRunCfg’s docstring for the full reference.

def example_foo(run_cfg: bn.BenchRunCfg | None = None) -> bn.Bench:
    run_cfg.cache_results = False   # disable for file-based / non-deterministic results
    bench = bn.Bench("name", MyBenchmark(), run_cfg=run_cfg)
    ...
    return bench

if __name__ == "__main__":
    bn.run(example_foo, subsampling_divisions=4)    # subsampling_divisions controls sweep detail depth

The benchmark() Method

Every benchmark class inherits from bn.ParametrizedSweep and implements benchmark():

class MyBench(bn.ParametrizedSweep):
    x = bn.FloatSweep(bounds=(0, 1))
    result = bn.ResultFloat()

    def benchmark(self):
        self.result = compute(self.x)

When benchmark() is called, all sweep parameters (self.x, etc.) are already set. Just set result variables directly on self. No boilerplate required.

Migration from __call__: The old pattern of overriding __call__() with self.update_params_from_kwargs(**kwargs) and return super().__call__() is deprecated. Simply rename __call__ to benchmark, remove the two boilerplate lines, and remove **kwargs from the signature.

File-Based Results (Images, Videos)

When producing files:

Write to a unique path per combination (use parameter values in the path)
Set run_cfg.cache_results = False
Use bn.ResultImage() / bn.ResultVideo() and set to the path string

class ImageBench(bn.ParametrizedSweep):
    width = bn.IntSweep(bounds=(100, 500))
    output = bn.ResultImage()

    def benchmark(self):
        path = bn.gen_image_path(f"output_{self.width}")
        render_image(self.width, path)
        self.output = str(path)

Entry Point Convention

Function name must start with example_ (used for discovery by tests and docs)
Accept run_cfg: bn.BenchRunCfg | None = None
Return the bn.Bench instance
Use bn.run(example_func) in __main__

See the Workflows gallery for complete examples showing this convention in action, including multi-sweep and BenchRunner patterns.

Aggregating Dimensions

When sweeping many dimensions, the visualizations can become unwieldy. Use the aggregate parameter on plot_sweep() to collapse dimensions into summary statistics (mean, std, etc.):

bench.plot_sweep(
    "Aggregated view",
    input_vars=["x", "y", "method"],
    result_vars=["elapsed"],
    aggregate=True,          # collapse all dimensions except the first
    # aggregate=2,           # collapse the last 2 dimensions
    # aggregate=["method"],  # collapse only the "method" dimension
    agg_fn="mean",           # aggregation function: mean, sum, max, min, median
)

aggregate=True — collapse all dimensions except the first into a single aggregated statistic
aggregate=N (int) — collapse the last N dimensions
aggregate=["var1", "var2"] — collapse only the named dimensions

See the Aggregation gallery for examples of each mode.

Machine-Readable Results (Agents & CI)

Bencher already computes per-metric verdicts, optimal values, and regression deltas during collection. To consume them programmatically — from an agent, a CI gate, or another script — export them as JSON instead of scraping the HTML report or logs.

import bencher as bn

res = bench.collect(input_vars=[...], result_vars=[...], run_cfg=run_cfg)

# A single run -> result.json
bn.result_to_dict(res)             # dict: schema_version, metrics, regressions, provenance
bn.result_to_json(res, "result.json")

# A/B between two independently collected results -> comparison.json
cmp = bn.compare_results(baseline_res, candidate_res)   # per-metric verdict + summary counts
bn.comparison_to_json(baseline_res, candidate_res, "comparison.json")

compare_results runs the same regression detector used by the over-time path (a percentage comparison by default), so each metric’s verdict is one of improved / regressed / unchanged using identical direction/threshold semantics. Pass run_cfg= to choose a different regression_method.

The same artifacts are available from the CLI on a saved result (see the collect/render split):

# render HTML and also emit result.json
python -m bencher.render result.pkl out_dir --json result.json

# diff two saved results
python -m bencher.render compare baseline.pkl candidate.pkl --json comparison.json

BenchReport.save(..., emit_json=True) writes result.json next to the HTML for every contained result (opt-in; default off). All JSON output is strict — non-finite values (e.g. a zero-baseline percent change) are emitted as null.

Common Mistakes

Mistake	Fix
Manually looping over parameter combinations	Use `plot_sweep(input_vars=[...])`
One StringSweep encoding multiple independent toggles	Use separate BoolSweep / IntSweep per toggle
Many small plot_sweep calls for different combos	One plot_sweep with all input_vars
Building panel/HTML layouts manually	Use bencher’s report system
Using the old `__call__` pattern with boilerplate	Override `benchmark()` instead
Caching file-path results	Set `run_cfg.cache_results = False`
Using `ResultFloat` for success/failure booleans	Use `ResultBool()` — bounds are [0, 1], plots render correctly