bencher.report_export
Machine-readable export of benchmark results for agents and CI.
Bencher already computes per-metric verdicts, optimal values, and regression deltas during collection — but historically only emitted them as HTML, pickle, or human-prose markdown. This module turns those already-computed values into a stable JSON contract so an automated workflow can read ground truth instead of scraping logs or parsing rendered reports.
Two artifacts:
result_to_dict()/result_to_json()— a single run’s metrics + regression verdicts + provenance (result.json).compare_results()— an A/B diff between two independently-collected results (comparison.json). It reuses the over-timedetect_regressions()path verbatim by stacking the two results on a synthetic 2-pointover_timeaxis, so the A/B verdict shares identical direction/threshold semantics with the normal pipeline.
The contracts carry schema_version so downstream consumers can pin to a
shape.
Attributes
Functions
|
Best-effort provenance for a result (time-event label if recorded). |
|
Per-metric summary: identity + optimal value/inputs when computable. |
|
Coerce an optimal-input coordinate to a JSON-safe scalar. |
|
Build the stable, JSON-serializable contract for a single result. |
|
Write |
|
Return a single-snapshot dataset (collapse a pre-existing over_time axis). |
|
Classify a metric movement as improved / regressed / unchanged. |
|
Diff two independently-collected results into an A/B comparison contract. |
|
Write |
Module Contents
- bencher.report_export.SCHEMA_VERSION = 1
- bencher.report_export._provenance(bench_res: bencher.results.bench_result.BenchResult) dict
Best-effort provenance for a result (time-event label if recorded).
- bencher.report_export._metric_entry(bench_res: bencher.results.bench_result.BenchResult, rv) dict
Per-metric summary: identity + optimal value/inputs when computable.
- bencher.report_export._coord_scalar(values)
Coerce an optimal-input coordinate to a JSON-safe scalar.
- bencher.report_export.result_to_dict(bench_res: bencher.results.bench_result.BenchResult) dict
Build the stable, JSON-serializable contract for a single result.
- Parameters:
bench_res – A collected
BenchResult(e.g. fromplot_sweep(auto_plot=False)/Bench.collect()).- Returns:
A dict with
schema_version,bench_name,provenance,input_vars,over_time,metrics, andregressions.
- bencher.report_export.result_to_json(bench_res: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, indent: int = 2) pathlib.Path
Write
result_to_dict()for bench_res to path as JSON.
- bencher.report_export._snapshot_ds(bench_res: bencher.results.bench_result.BenchResult) xarray.Dataset
Return a single-snapshot dataset (collapse a pre-existing over_time axis).
- bencher.report_export._verdict(change_percent: float | None, direction: str, regressed: bool, threshold: float) str
Classify a metric movement as improved / regressed / unchanged.
regressedcomes straight from the detector (direction- and threshold-aware). An improvement is the mirror image: a beneficial-direction move whose magnitude clears the same threshold.
- bencher.report_export.compare_results(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, *, run_cfg=None) dict
Diff two independently-collected results into an A/B comparison contract.
Stacks baseline and candidate on a synthetic 2-point
over_timeaxis (baseline first, candidate last) and runs the regulardetect_regressions()over it, so the A/B verdict uses identical direction/threshold logic to the over-time path.- Parameters:
baseline – The reference result.
candidate – The result being compared against the baseline.
run_cfg – Optional
BenchRunCfgcontrolling the detector. When omitted, a percentage comparison (regression_method='percentage') is used — the natural choice for a two-point A/B.
- Returns:
A dict with
schema_version,baseline/candidateprovenance, per-metricmetrics(with averdict), and asummarycount.- Raises:
ValueError – when the two results share no comparable scalar metric.
- bencher.report_export.comparison_to_json(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, run_cfg=None, indent: int = 2) pathlib.Path
Write
compare_results()for the two results to path as JSON.