bencher.result_collector
========================

.. py:module:: bencher.result_collector

.. autoapi-nested-parse::

   Result collection and storage for benchmarking.

   This module provides the ResultCollector class for managing benchmark results,
   including xarray dataset operations, caching, and metadata management.


Attributes
----------

.. autoapisummary::

   bencher.result_collector.logger
   bencher.result_collector._MEDIA_RESULT_TYPES


Classes
-------

.. autoapisummary::

   bencher.result_collector.ResultCollector


Functions
---------

.. autoapisummary::

   bencher.result_collector._sentinel_for_result_var
   bencher.result_collector._null_old_entries
   bencher.result_collector.set_xarray_multidim
   bencher.result_collector._set_result_value


Module Contents
---------------

.. py:data:: logger

.. py:data:: _MEDIA_RESULT_TYPES

.. py:function:: _sentinel_for_result_var(rv)

   Return the sentinel value used for 'missing' entries of this result type.

   ResultVolume falls through to the default np.nan — it is numeric, not
   file-backed, so no media cleanup is needed even when max_time_events is set.


.. py:function:: _null_old_entries(dataset, rv, var_limit)

   Null out over_time entries older than *var_limit* for a single result variable.

   **Mutates *dataset* in-place** by writing sentinel values directly into
   the backing numpy arrays of the affected data variables.

   For media types (images, videos, .rrd files), the referenced files are
   collected for deferred deletion.  Returns a list of file paths to delete;
   the caller is responsible for removing them *after* the dataset is cached
   so that a cache-write failure does not leave orphaned sentinel values.


.. py:function:: set_xarray_multidim(data_array: xarray.DataArray, index_tuple: tuple[int, Ellipsis], value: Any) -> xarray.DataArray

   Set a value in a multi-dimensional xarray at the specified index position.

   This function sets a value in an N-dimensional xarray using dynamic indexing
   that works for any number of dimensions.

   :param data_array: The data array to modify
   :type data_array: xr.DataArray
   :param index_tuple: The index coordinates as a tuple
   :type index_tuple: tuple[int, ...]
   :param value: The value to set at the specified position
   :type value: Any

   :returns: The modified data array
   :rtype: xr.DataArray


.. py:function:: _set_result_value(bench_res: bencher.results.bench_result.BenchResult, rv_arrays: dict[str, numpy.ndarray] | None, name: str, idx: tuple, value: Any) -> None

   Write a single result value, using pre-cached numpy arrays when available.


.. py:class:: ResultCollector(cache_size: int = DEFAULT_CACHE_SIZE_BYTES)

   Manages benchmark result collection, storage, and caching.

   This class handles the initialization of xarray datasets for storing benchmark
   results, storing results from worker jobs, managing caches, and adding metadata.

   .. attribute:: cache_size

      Maximum size of the cache in bytes

      :type: int

   .. attribute:: ds_dynamic

      Dictionary for storing unstructured vector datasets

      :type: dict


   .. py:attribute:: cache_size
      :value: 0


   .. py:attribute:: ds_dynamic
      :type:  dict


   .. py:attribute:: _benchmark_cache
      :type:  diskcache.Cache | None
      :value: None


   .. py:attribute:: _history_cache
      :type:  diskcache.Cache | None
      :value: None


   .. py:method:: get_benchmark_cache() -> diskcache.Cache

      Return the persistent benchmark_inputs Cache, creating it on first access.


   .. py:method:: get_history_cache() -> diskcache.Cache

      Return the persistent history Cache, creating it on first access.


   .. py:method:: close_caches() -> None

      Close any open cache instances. Safe to call multiple times.


   .. py:method:: __enter__() -> ResultCollector


   .. py:method:: __exit__(*exc_info) -> None


   .. py:method:: setup_dataset(bench_cfg: bencher.bench_cfg.BenchCfg, time_src: datetime.datetime | str) -> tuple[bencher.results.bench_result.BenchResult, zip, list[str], int]

      Initialize an n-dimensional xarray dataset from benchmark configuration parameters.

      This function creates the data structures needed to store benchmark results based on
      the provided configuration. It sets up the xarray dimensions, coordinates, and variables
      based on input variables and result variables.

      :param bench_cfg: Configuration defining the benchmark parameters, inputs, and
                        results
      :type bench_cfg: BenchCfg
      :param time_src: Timestamp or event name for the benchmark run
      :type time_src: datetime | str

      :returns:     - A BenchResult object with the initialized dataset
                    - A lazy iterator of function input tuples (index, value pairs)
                    - A list of dimension names for the dataset
                    - The total number of jobs (Cartesian product size)
      :rtype: tuple[BenchResult, zip, list[str], int]


   .. py:method:: define_extra_vars(bench_cfg: bencher.bench_cfg.BenchCfg, repeats: int, time_src: datetime.datetime | str) -> list[bencher.variables.inputs.IntSweep]

      Define extra meta variables for tracking benchmark execution details.

      This function creates variables that aren't passed to the worker function but are stored
      in the n-dimensional array to provide context about the benchmark, such as the number of
      repeat measurements and timestamps.

      :param bench_cfg: The benchmark configuration to add variables to
      :type bench_cfg: BenchCfg
      :param repeats: The number of times each sample point should be measured
      :type repeats: int
      :param time_src: Either a timestamp or a string event name for temporal
                       tracking
      :type time_src: datetime | str

      :returns: A list of additional parameter variables to include in the benchmark
      :rtype: list[IntSweep]


   .. py:method:: precompute_result_arrays(bench_res: bencher.results.bench_result.BenchResult) -> dict[str, numpy.ndarray]
      :staticmethod:


      Pre-fetch the underlying numpy arrays for all result variables.

      This avoids repeated xarray Dataset.__getitem__ lookups (which trigger
      _construct_dataarray) during the per-job store loop.  The returned arrays
      are views into the dataset, so writes go directly into bench_res.ds.


   .. py:method:: store_results(job_result: bencher.job.JobFuture, bench_res: bencher.results.bench_result.BenchResult, worker_job: bencher.worker_job.WorkerJob, bench_run_cfg: bencher.bench_cfg.BenchRunCfg, rv_arrays: dict[str, numpy.ndarray] | None = None) -> None

      Store the results from a benchmark worker job into the benchmark result dataset.

      This method handles unpacking the results from worker jobs and placing them
      in the correct locations in the n-dimensional result dataset. It supports different
      types of result variables including scalars, vectors, references, and media.

      :param job_result: The future containing the worker function result
      :type job_result: JobFuture
      :param bench_res: The benchmark result object to store results in
      :type bench_res: BenchResult
      :param worker_job: The job metadata needed to index the result
      :type worker_job: WorkerJob
      :param bench_run_cfg: Configuration for how results should be handled
      :type bench_run_cfg: BenchRunCfg
      :param rv_arrays: Pre-computed numpy arrays from
                        precompute_result_arrays(). Falls back to dataset lookup if None.
      :type rv_arrays: dict, optional

      :raises RuntimeError: If an unsupported result variable type is encountered


   .. py:method:: cache_results(bench_res: bencher.results.bench_result.BenchResult, bench_cfg_hash: str, bench_cfg_hashes: list[str]) -> None

      Cache benchmark results for future retrieval.

      This method stores benchmark results in the disk cache using the benchmark
      configuration hash as the key. It temporarily removes non-pickleable objects
      from the benchmark result before caching.

      :param bench_res: The benchmark result to cache
      :type bench_res: BenchResult
      :param bench_cfg_hash: The hash value to use as the cache key
      :type bench_cfg_hash: str
      :param bench_cfg_hashes: List to append the hash to (modified in place)
      :type bench_cfg_hashes: list[str]


   .. py:method:: load_history_cache(dataset: xarray.Dataset, bench_cfg_hash: str, clear_history: bool, max_time_events: int | None = None, result_vars: list | None = None) -> xarray.Dataset

      Load historical data from a cache if over_time is enabled.

      This method is used to retrieve and concatenate historical benchmark data from the cache
      when tracking performance over time. If clear_history is True, it will clear any existing
      historical data instead of loading it.

      :param dataset: Freshly calculated benchmark data for the current run
      :type dataset: xr.Dataset
      :param bench_cfg_hash: Hash of the input variables used to identify cached data
      :type bench_cfg_hash: str
      :param clear_history: If True, clears historical data instead of loading it
      :type clear_history: bool
      :param max_time_events: Maximum number of over_time events to retain.
                              Oldest events are trimmed. None means unlimited.
      :type max_time_events: int | None
      :param result_vars: Result variable instances. When a variable has a
                          per-variable ``max_time_events`` smaller than the dataset's over_time
                          size, older entries are set to sentinel and media files are deleted.
      :type result_vars: list | None

      :returns:

                Combined dataset with both historical and current benchmark data,
                    or just the current data if no history exists or history is cleared
      :rtype: xr.Dataset


   .. py:method:: add_metadata_to_dataset(bench_res: bencher.results.bench_result.BenchResult, input_var: Any) -> None

      Add variable metadata to the xarray dataset for improved visualization.

      This method adds metadata like units, long names, and descriptions to the xarray dataset
      attributes, which helps visualization tools properly label axes and tooltips.

      :param bench_res: The benchmark result object containing the dataset to display
      :type bench_res: BenchResult
      :param input_var: The variable to extract metadata from


   .. py:method:: report_results(bench_res: bencher.results.bench_result.BenchResult, print_xarray: bool, print_pandas: bool) -> None

      Display the calculated benchmark data in various formats.

      This method provides options to display the benchmark results as xarray data structures
      or pandas DataFrames for debugging and inspection.

      :param bench_res: The benchmark result containing the dataset to display
      :type bench_res: BenchResult
      :param print_xarray: If True, log the raw xarray Dataset structure
      :type print_xarray: bool
      :param print_pandas: If True, log the dataset converted to a pandas DataFrame
      :type print_pandas: bool