bencher.regression
==================

.. py:module:: bencher.regression

.. autoapi-nested-parse::

   Benchmark regression detection for over-time benchmarks.

   Provides statistical methods to detect if benchmark values have changed
   significantly between runs. Supports a percentage threshold and an
   adaptive MAD-based detector with an optional percent floor for dual-band
   suppression.


Attributes
----------

.. autoapisummary::

   bencher.regression._METHOD_DEFAULTS
   bencher.regression._MAD_TO_SIGMA
   bencher.regression._DRIFT_FRAC
   bencher.regression._HAMPEL_K


Exceptions
----------

.. autoapisummary::

   bencher.regression.RegressionError


Classes
-------

.. autoapisummary::

   bencher.regression.RegressionResult
   bencher.regression.RegressionReport
   bencher.regression.MethodCells


Functions
---------

.. autoapisummary::

   bencher.regression.method_cells
   bencher.regression._format_summary_line
   bencher.regression._format_markdown_row
   bencher.regression._regression_plot_spec
   bencher.regression._ensure_matplotlib_backend_loaded
   bencher.regression.build_regression_overlay
   bencher.regression.render_regression_png
   bencher.regression._clean_1d
   bencher.regression._safe_change_percent
   bencher.regression._is_regression
   bencher.regression._exceeds_directional_threshold
   bencher.regression.detect_percentage
   bencher.regression._robust_scale
   bencher.regression._residual_sigma
   bencher.regression.detect_adaptive
   bencher.regression.detect_delta
   bencher.regression.detect_absolute
   bencher.regression._compute_history_arrays
   bencher.regression._attach_plot_metadata
   bencher.regression.detect_regressions


Module Contents
---------------

.. py:data:: _METHOD_DEFAULTS

.. py:data:: _MAD_TO_SIGMA
   :value: 1.4826


.. py:data:: _DRIFT_FRAC
   :value: 0.85


.. py:data:: _HAMPEL_K
   :value: 5.0


.. py:exception:: RegressionError

   Bases: :py:obj:`Exception`


   Raised when regression detection finds regressions and regression_fail is True.


.. py:class:: RegressionResult

   Result of regression detection for a single variable.


   .. py:attribute:: variable
      :type:  str


   .. py:attribute:: method
      :type:  str


   .. py:attribute:: regressed
      :type:  bool


   .. py:attribute:: current_value
      :type:  float


   .. py:attribute:: baseline_value
      :type:  float


   .. py:attribute:: change_percent
      :type:  float


   .. py:attribute:: threshold
      :type:  float


   .. py:attribute:: direction
      :type:  str


   .. py:attribute:: details
      :type:  str


   .. py:attribute:: band_lower
      :type:  float | None
      :value: None


   .. py:attribute:: band_upper
      :type:  float | None
      :value: None


   .. py:attribute:: percent_band_lower
      :type:  float | None
      :value: None


   .. py:attribute:: percent_band_upper
      :type:  float | None
      :value: None


   .. py:attribute:: historical
      :type:  numpy.ndarray | None
      :value: None


   .. py:attribute:: current_samples
      :type:  numpy.ndarray | None
      :value: None


   .. py:attribute:: historical_all
      :type:  numpy.ndarray | None
      :value: None


   .. py:attribute:: historical_all_x
      :type:  numpy.ndarray | None
      :value: None


   .. py:attribute:: historical_x
      :type:  numpy.ndarray | None
      :value: None


   .. py:attribute:: current_x
      :type:  numpy.ndarray | None
      :value: None


   .. py:method:: render_png(historical: numpy.ndarray | None = None, current: numpy.ndarray | float | None = None, path: str | pathlib.Path | None = None, figsize: tuple[float, float] = (8.0, 5.0), dpi: int = 100) -> str

      Render this result as a diagnostic PNG (see :func:`render_regression_png`).


   .. py:method:: render_overlay(historical: numpy.ndarray | None = None, current: numpy.ndarray | float | None = None)

      Build a :class:`holoviews.Overlay` of this result (see :func:`build_regression_overlay`).


.. py:class:: RegressionReport

   Aggregates regression results for all variables in a benchmark.


   .. py:attribute:: results
      :type:  list[RegressionResult]
      :value: []


   .. py:property:: has_regressions
      :type: bool


   .. py:property:: regressed_variables
      :type: list[RegressionResult]


   .. py:method:: summary() -> str


   .. py:method:: to_markdown() -> str

      Return a nicely formatted Markdown summary of all regression results.


   .. py:method:: append_to_report(report) -> None

      Append a formatted regression summary to a :class:`BenchReport`.


   .. py:method:: prepend_to_result(report, bench_res) -> None

      Insert a formatted regression summary at the top of *bench_res*'s tab.


.. py:class:: MethodCells

   Per-method rendering of a single regression result.

   Each detector has a different gate — percent ratio, MAD-sigma, absolute
   delta, hard limit — so the report cells must describe it in its own
   units. This bundle is the single source of truth consumed by both the
   built-in text summary and the markdown table, and is exposed as public
   API so downstream report builders can produce their own layouts
   (custom columns, non-markdown output, templated HTML, GitHub PR
   comments with status decoration, etc.) without reimplementing method
   dispatch and drifting when new detection methods are added.

   Example — building a minimal custom row from a RegressionResult::

       from bencher import method_cells
       cells = method_cells(result)
       row = f"{result.variable}: {cells.change} (gate {cells.threshold})"

   .. attribute:: change

      Change column (markdown) — gated quantity in its own units.

   .. attribute:: baseline

      Baseline column (markdown) — em-dash for absolute (no
      historical baseline exists).

   .. attribute:: threshold

      Threshold column (markdown) — carries the gate's native
      units (``±T%``, ``Tσ``, ``±T``, or a direction-aware inequality).

   .. attribute:: summary_lead

      First clause of the summary line, before the details
      parenthesis. Captures the gated quantity in sentence form.

   .. attribute:: summary_standalone

      When True, the summary line skips the
      ``(baseline=…, current=…, threshold=…)`` tail because
      ``summary_lead`` already contains the relevant values. Used by
      the absolute method (no baseline, limit is in the lead).


   .. py:attribute:: change
      :type:  str


   .. py:attribute:: baseline
      :type:  str


   .. py:attribute:: threshold
      :type:  str


   .. py:attribute:: summary_lead
      :type:  str


   .. py:attribute:: summary_standalone
      :type:  bool
      :value: False


.. py:function:: method_cells(r: RegressionResult) -> MethodCells

   Build the per-method cell bundle for a :class:`RegressionResult`.

   Returns a :class:`MethodCells` with pre-rendered display strings for
   the result's change, baseline, and threshold, plus the summary lead
   clause. Dispatches on ``r.method`` so each gate describes itself in
   its native units. Safe to call on any ``RegressionResult`` — unknown
   methods fall back to the percentage-style rendering.

   Intended for consumers that want to embed regression results in a
   custom layout while staying consistent with how the built-in
   :meth:`RegressionReport.summary` and :meth:`RegressionReport.to_markdown`
   present each method.

   Notes on the ``absolute`` branch: ``baseline_value`` and ``threshold``
   both hold the limit for this detector (see :func:`detect_absolute`);
   the code reads from ``threshold`` to make the intent ("this is the
   gate value") explicit.


.. py:function:: _format_summary_line(r: RegressionResult) -> str

.. py:function:: _format_markdown_row(r: RegressionResult) -> str

.. py:function:: _regression_plot_spec(result: RegressionResult, historical: numpy.ndarray | None, current: numpy.ndarray | float | None) -> dict

   Prepare the data + styling used by both the matplotlib and holoviews renderers.

   Resolves the history and current arrays from the arguments first, falling
   back to anything stored on *result*. Returns a dict of primitives the
   backend-specific renderers consume. Keeping this shared guarantees the PNG
   and in-report plots stay in sync as the diagnostic evolves.


.. py:function:: _ensure_matplotlib_backend_loaded() -> None

   Register the holoviews matplotlib backend without changing the default.

   render_regression_png needs matplotlib to export a PNG, but the report path
   uses bokeh — calling hv.extension('matplotlib') naively would flip the
   global default mid-run. This loads the renderer if missing, then restores
   the prior default. Selects the non-interactive Agg backend when no
   matplotlib backend has been configured yet (``force=False``), so holoviews
   doesn't pick up Tk/Qt on a fresh process (which leaks ``main thread is not
   in main loop`` tracebacks at interpreter shutdown). If the caller has
   already configured a backend (e.g., Jupyter's inline backend), that choice
   is left alone.


.. py:function:: build_regression_overlay(result: RegressionResult, historical: numpy.ndarray | None = None, current: numpy.ndarray | float | None = None, width: int = 700, height: int = 350, fig_inches: tuple[float, float] = (7.0, 3.5))

   Build a :class:`holoviews.Overlay` diagnostic of a regression result.

   Opts are applied per-backend so the same overlay renders correctly under
   both bokeh (for embedded HTML reports) and matplotlib (for PNG export via
   :func:`render_regression_png`). History always shows as mean line + raw
   alpha scatter; regression-specific layers (acceptance band, baseline,
   verdict-coloured current marker) are conditional on the data in *result*.

   :param result: The :class:`RegressionResult` to visualise.
   :param historical: Optional 1-D array of historical per-time-point means.
                      Falls back to ``result.historical`` if omitted.
   :param current: Optional current-run sample array (or scalar). Falls back to
                   ``result.current_samples`` / ``result.current_value``.
   :param width: Pixel dimensions for the bokeh backend.
   :param height: Pixel dimensions for the bokeh backend.
   :param fig_inches: Figure size in inches for the matplotlib backend.


.. py:function:: render_regression_png(result: RegressionResult, historical: numpy.ndarray | None = None, current: numpy.ndarray | float | None = None, path: str | pathlib.Path | None = None, figsize: tuple[float, float] = (8.0, 5.0), dpi: int = 100) -> str

   Render a diagnostic PNG by saving the shared holoviews overlay via matplotlib.

   Produces the same plot as the in-report bokeh overlay — it calls
   :func:`build_regression_overlay` and hands the result to holoviews'
   matplotlib renderer, so there's a single source of truth for the
   diagnostic visual.

   :param result: The :class:`RegressionResult` produced by a ``detect_*`` call.
   :param historical: 1-D array of historical per-time-point means. Falls back
                      to ``result.historical``.
   :param current: Current-run sample(s). Falls back to ``result.current_samples``
                   / ``result.current_value``.
   :param path: Output PNG path. If ``None``, a path is generated via
                :func:`bencher.utils.gen_image_path` so the file lives under the
                bencher cache directory.
   :param figsize: Figure size in inches (matplotlib ``fig_inches``).
   :param dpi: Output DPI (500x320 at ``dpi=100`` works well for GitHub comments).

   :returns: Absolute path to the saved PNG as a string.


.. py:function:: _clean_1d(a: numpy.ndarray) -> numpy.ndarray

   Flatten to 1-D float and remove NaNs.


.. py:function:: _safe_change_percent(current: float, baseline: float) -> float

   Calculate percentage change, handling zero baseline gracefully.


.. py:function:: _is_regression(change_percent: float, direction: bencher.variables.results.OptDir) -> bool

   Determine if a change constitutes a regression given the optimization direction.


.. py:function:: _exceeds_directional_threshold(change_percent: float, threshold_percent: float, direction: bencher.variables.results.OptDir) -> bool

   Check if change exceeds threshold in the direction-appropriate sense.


.. py:function:: detect_percentage(variable: str, historical: numpy.ndarray, current: numpy.ndarray, threshold_percent: float = 5.0, direction: bencher.variables.results.OptDir = OptDir.minimize) -> RegressionResult

   Compare current mean vs historical mean by percentage threshold.

   Simple escape hatch: one directional rule comparing the current mean
   against the historical mean. Same shape as :func:`detect_delta` and
   :func:`detect_absolute`; contrast with :func:`detect_adaptive` which
   layers noise modelling, drift test, and a dual-band AND gate.


.. py:function:: _robust_scale(values: numpy.ndarray) -> tuple[float, float]

   Return (median, MAD-based sigma) for a 1-D numeric array.

   The MAD is scaled by 1.4826 so it matches the standard deviation for
   Gaussian data.


.. py:function:: _residual_sigma(values: numpy.ndarray) -> float

   Estimate step-to-step noise via MAD of first differences.

   For data ``y[i] = trend[i] + eps[i]`` the diff ``y[i+1] - y[i]`` has variance
   ``2 * sigma^2``, so ``MAD(diff) * 1.4826 / sqrt(2)`` recovers sigma even
   when ``trend`` is non-stationary. This prevents a gradual drift from
   inflating its own noise estimate and masking itself.


.. py:function:: detect_adaptive(variable: str, historical_time_means: numpy.ndarray, current: numpy.ndarray, regression_mad: float = 3.5, drift_threshold: float | None = None, mk_alpha: float = 0.1, direction: bencher.variables.results.OptDir = OptDir.minimize, historical_samples: numpy.ndarray | None = None, regression_percentage: float | None = None) -> RegressionResult

   Robust regression detection combining step and drift tests.

   The method estimates the metric's inherent noise from history using a
   median + MAD (median absolute deviation) scale and expresses the current
   run's deviation in those noise units. Two orthogonal tests run in parallel:

   * **Short-term step** — flags if ``(current_mean - baseline) / noise_floor``
     exceeds ``regression_mad`` in the regression direction.
   * **Long-term drift** — fits a Theil–Sen slope on the historical time-point
     means (after a Hampel filter removes isolated outliers) and flags if the
     total projected drift, scaled by ``noise_floor``, exceeds
     ``drift_threshold`` and a Mann–Kendall test confirms monotonic trend
     with ``p < mk_alpha``.

   :param variable: Name of the result variable being checked.
   :param historical_time_means: 1-D array of per-time-point mean values from
                                 history (one entry per prior run).
   :param current: Current run values (will be averaged).
   :param regression_mad: Step-test threshold in MAD-sigma units.
   :param drift_threshold: Drift-test threshold in MAD-sigma units. If ``None``,
                           defaults to ``_DRIFT_FRAC * regression_mad`` so users need to tune
                           only one knob.
   :param mk_alpha: Significance level for the Mann–Kendall trend guard.
   :param direction: Optimization direction from the result variable.
   :param historical_samples: Optional flat array of all historical samples
                              (not per-time means). Used for the sparse-history fallback so the
                              delegated ``percentage`` detector sees the same input it would
                              have received from ``detect_regressions`` directly. Falls back to
                              ``historical_time_means`` when not provided.
   :param regression_percentage: Optional minimum percent change required to
                                 flag a regression (directional, i.e. interpreted against
                                 ``direction``). When set, acts as a second acceptance band: a
                                 regression fires only when BOTH the MAD test and the percent
                                 change exceed their thresholds. Suppresses noise-floor false
                                 positives on metrics with few repeats or very tight history.


.. py:function:: detect_delta(variable: str, historical_time_means: numpy.ndarray, current: numpy.ndarray, max_delta: float, direction: bencher.variables.results.OptDir = OptDir.minimize) -> RegressionResult

   Fail when the current mean's delta from history exceeds ``max_delta``.

   Simple escape hatch: one directional rule on the absolute-unit delta
   between the current mean and the mean of all historical per-time means.
   ``minimize`` fails when ``curr - hist_mean > max_delta``; ``maximize``
   fails when ``hist_mean - curr > max_delta``; ``none`` uses ``|delta|``.
   Same shape as :func:`detect_percentage` and :func:`detect_absolute`;
   contrast with :func:`detect_adaptive` which layers noise modelling and
   drift testing. Selected via ``regression_method='delta'``.


.. py:function:: detect_absolute(variable: str, current: numpy.ndarray, limit: float, direction: bencher.variables.results.OptDir = OptDir.minimize) -> RegressionResult

   Fail when current mean violates an absolute limit in the direction of OptDir.

   Simple escape hatch: one directional rule against a fixed limit — no
   historical data required. For ``OptDir.minimize`` ``limit`` is a ceiling;
   for ``OptDir.maximize`` it's a floor; ``OptDir.none`` records a
   non-regressed result and leaves it to the caller to log. Same shape as
   :func:`detect_percentage` and :func:`detect_delta`; contrast with
   :func:`detect_adaptive` which needs history to estimate noise.


.. py:function:: _compute_history_arrays(da: xarray.DataArray) -> tuple[numpy.ndarray | None, numpy.ndarray | None, numpy.ndarray | None]

   Aggregate history into per-time means + per-sample scatter arrays.

   Returns ``(time_means, hist_samples_flat, hist_x_flat)`` or all-``None``
   when there is no history to summarise. Per-time means collapse every
   non-time dim into one scalar per run so detection and plotting both see
   a 1-D series; the scatter arrays preserve per-repeat spread broadcast
   against the historical over_time coords.


.. py:function:: _attach_plot_metadata(result: RegressionResult, *, time_coord: numpy.ndarray, current_samples: numpy.ndarray, time_means: numpy.ndarray | None, hist_samples_flat: numpy.ndarray | None, hist_x_flat: numpy.ndarray | None) -> None

   Attach the history/current arrays a RegressionResult needs for replay plotting.


.. py:function:: detect_regressions(dataset: xarray.Dataset, bench_cfg, run_cfg) -> RegressionReport

   Run regression detection on a dataset with over_time dimension.

   For each numeric result variable, dispatches to the detector chosen by
   ``run_cfg.regression_method`` (``percentage``, ``adaptive``, ``delta``, or
   ``absolute``). ``absolute`` runs even with a single over_time point since
   it needs no baseline; every other method requires history.

   :param dataset: xarray Dataset with an over_time dimension.
   :param bench_cfg: BenchCfg with ``result_vars`` list.
   :param run_cfg: BenchRunCfg. Reads ``regression_method`` and its
                   method-specific threshold: ``regression_percentage`` for
                   ``percentage``; ``regression_mad`` (plus ``regression_percentage``
                   as a dual-band gate) for ``adaptive``; ``regression_delta`` for
                   ``delta``; ``regression_absolute`` for ``absolute``.

   :returns: RegressionReport with one result per variable per fired detector/guard.