bencher.report_export
=====================

.. py:module:: bencher.report_export

.. autoapi-nested-parse::

   Machine-readable export of benchmark results for agents and CI.

   Bencher already *computes* per-metric verdicts, optimal values, and regression
   deltas during collection — but historically only emitted them as HTML, pickle,
   or human-prose markdown. This module turns those already-computed values into a
   stable JSON contract so an automated workflow can read ground truth instead of
   scraping logs or parsing rendered reports.

   Two artifacts:

   * :func:`result_to_dict` / :func:`result_to_json` — a single run's metrics +
     regression verdicts + provenance (``result.json``).
   * :func:`compare_results` — an A/B diff between two independently-collected
     results (``comparison.json``). It reuses the over-time
     :func:`~bencher.regression.detect_regressions` path verbatim by stacking the
     two results on a synthetic 2-point ``over_time`` axis, so the A/B verdict
     shares identical direction/threshold semantics with the normal pipeline.

   The contracts carry ``schema_version`` so downstream consumers can pin to a
   shape.


Attributes
----------

.. autoapisummary::

   bencher.report_export.SCHEMA_VERSION


Functions
---------

.. autoapisummary::

   bencher.report_export._provenance
   bencher.report_export._metric_entry
   bencher.report_export._coord_scalar
   bencher.report_export.result_to_dict
   bencher.report_export.result_to_json
   bencher.report_export._snapshot_ds
   bencher.report_export._verdict
   bencher.report_export.compare_results
   bencher.report_export.comparison_to_json


Module Contents
---------------

.. py:data:: SCHEMA_VERSION
   :value: 1


.. py:function:: _provenance(bench_res: bencher.results.bench_result.BenchResult) -> dict

   Best-effort provenance for a result (time-event label if recorded).


.. py:function:: _metric_entry(bench_res: bencher.results.bench_result.BenchResult, rv) -> dict

   Per-metric summary: identity + optimal value/inputs when computable.


.. py:function:: _coord_scalar(values)

   Coerce an optimal-input coordinate to a JSON-safe scalar.


.. py:function:: result_to_dict(bench_res: bencher.results.bench_result.BenchResult) -> dict

   Build the stable, JSON-serializable contract for a single result.

   :param bench_res: A collected :class:`BenchResult` (e.g. from
                     ``plot_sweep(auto_plot=False)`` / :meth:`Bench.collect`).

   :returns: A dict with ``schema_version``, ``bench_name``, ``provenance``,
             ``input_vars``, ``over_time``, ``metrics``, and ``regressions``.


.. py:function:: result_to_json(bench_res: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, indent: int = 2) -> pathlib.Path

   Write :func:`result_to_dict` for *bench_res* to *path* as JSON.


.. py:function:: _snapshot_ds(bench_res: bencher.results.bench_result.BenchResult) -> xarray.Dataset

   Return a single-snapshot dataset (collapse a pre-existing over_time axis).


.. py:function:: _verdict(change_percent: float | None, direction: str, regressed: bool, threshold: float) -> str

   Classify a metric movement as improved / regressed / unchanged.

   ``regressed`` comes straight from the detector (direction- and
   threshold-aware). An improvement is the mirror image: a beneficial-direction
   move whose magnitude clears the same threshold.


.. py:function:: compare_results(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, *, run_cfg=None) -> dict

   Diff two independently-collected results into an A/B comparison contract.

   Stacks *baseline* and *candidate* on a synthetic 2-point ``over_time`` axis
   (baseline first, candidate last) and runs the regular
   :func:`~bencher.regression.detect_regressions` over it, so the A/B verdict
   uses identical direction/threshold logic to the over-time path.

   :param baseline: The reference result.
   :param candidate: The result being compared against the baseline.
   :param run_cfg: Optional :class:`BenchRunCfg` controlling the detector. When
                   omitted, a percentage comparison (``regression_method='percentage'``)
                   is used — the natural choice for a two-point A/B.

   :returns: A dict with ``schema_version``, ``baseline``/``candidate`` provenance,
             per-metric ``metrics`` (with a ``verdict``), and a ``summary`` count.

   :raises ValueError: when the two results share no comparable scalar metric.


.. py:function:: comparison_to_json(baseline: bencher.results.bench_result.BenchResult, candidate: bencher.results.bench_result.BenchResult, path: str | pathlib.Path, *, run_cfg=None, indent: int = 2) -> pathlib.Path

   Write :func:`compare_results` for the two results to *path* as JSON.