bencher.report_export ===================== .. py:module:: bencher.report_export .. autoapi-nested-parse:: Machine-readable export of benchmark results for agents and CI. Bencher already *computes* per-metric verdicts, optimal values, and regression deltas during collection — but historically only emitted them as HTML, pickle, or human-prose markdown. This module turns those already-computed values into a stable JSON contract so an automated workflow can read ground truth instead of scraping logs or parsing rendered reports. Two artifacts: * :func:`result_to_dict` / :func:`result_to_json` — a single run's metrics + regression verdicts + provenance (``result.json``). * :func:`compare_results` — an A/B diff between two independently-collected results (``comparison.json``). It reuses the over-time :func:`~bencher.regression.detect_regressions` path verbatim by stacking the two results on a synthetic 2-point ``over_time`` axis, so the A/B verdict shares identical direction/threshold semantics with the normal pipeline. The contracts carry ``schema_version`` so downstream consumers can pin to a shape. Attributes ---------- .. autoapisummary:: bencher.report_export.SCHEMA_VERSION Functions --------- .. autoapisummary:: bencher.report_export._provenance bencher.report_export._metric_entry bencher.report_export._coord_scalar bencher.report_export.result_to_dict bencher.report_export.result_to_json bencher.report_export._snapshot_ds bencher.report_export._verdict bencher.report_export.compare_results bencher.report_export.comparison_to_json Module Contents --------------- .. py:data:: SCHEMA_VERSION :value: 1 .. py:function:: _provenance(bench_res: bencher.results.bench_result.BenchResult) -> dict Best-effort provenance for a result (time-event label if recorded). .. py:function:: _metric_entry(bench_res: bencher.results.bench_result.BenchResult, rv) -> dict Per-metric summary: identity + optimal value/inputs when computable. .. py:function:: _coord_scalar(values) Coerce an optimal-input coordinate to a JSON-safe scalar. .. py:function:: result_to_dict(bench_res: bencher.results.bench_result.BenchResult) -> dict Build the stable, JSON-serializable contract for a single result. :param bench_res: A collected :class:`BenchResult` (e.g. from ``plot_sweep(auto_plot=False)`` / :meth:`Bench.collect`). :returns: A dict with ``schema_version``, ``bench_name``, ``provenance``, ``input_vars``, ``over_time``, ``metrics``, and ``regressions``. .. py:function:: result_to_json(bench_res: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, indent: int = 2) -> pathlib.Path Write :func:`result_to_dict` for *bench_res* to *path* as JSON. .. py:function:: _snapshot_ds(bench_res: bencher.results.bench_result.BenchResult) -> xarray.Dataset Return a single-snapshot dataset (collapse a pre-existing over_time axis). .. py:function:: _verdict(change_percent: float | None, direction: str, regressed: bool, threshold: float) -> str Classify a metric movement as improved / regressed / unchanged. ``regressed`` comes straight from the detector (direction- and threshold-aware). An improvement is the mirror image: a beneficial-direction move whose magnitude clears the same threshold. .. py:function:: compare_results(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, *, run_cfg=None) -> dict Diff two independently-collected results into an A/B comparison contract. Stacks *baseline* and *candidate* on a synthetic 2-point ``over_time`` axis (baseline first, candidate last) and runs the regular :func:`~bencher.regression.detect_regressions` over it, so the A/B verdict uses identical direction/threshold logic to the over-time path. :param baseline: The reference result. :param candidate: The result being compared against the baseline. :param run_cfg: Optional :class:`BenchRunCfg` controlling the detector. When omitted, a percentage comparison (``regression_method='percentage'``) is used — the natural choice for a two-point A/B. :returns: A dict with ``schema_version``, ``baseline``/``candidate`` provenance, per-metric ``metrics`` (with a ``verdict``), and a ``summary`` count. :raises ValueError: when the two results share no comparable scalar metric. .. py:function:: comparison_to_json(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, run_cfg=None, indent: int = 2) -> pathlib.Path Write :func:`compare_results` for the two results to *path* as JSON.