bencher.scorecard

Generic benchmark health scorecard: summaries -> one grouped HTML page.

Reads the machine-readable *.summary.json written by result_to_json() (with include_series=True) for a set of benchmarks and renders a single page where every scalar metric shows, at a glance, a regression verdict and a noise sparkline. Project specifics — the tag registry, metric aliases, and report layout — are supplied via ScorecardConfig.

Submodules

Attributes

DEFAULT_OTHER_CATEGORY

Classes

`Chrome`	Optional page header content (title, provenance, and CI nav links).
`ReportLayout`	Where per-benchmark artifacts live under the reports directory.
`ScorecardConfig`	Project-specific inputs to the scorecard renderer.

Functions

`discover_report_links`(→ list[dict])	Benchmarks with an HTML report but no scalar metrics, grouped by category.
`discover_summaries`(→ list[dict])	Parse every `*.summary.json` under the reports root.
`tag_to_name`(→ str)	Fallback display name for an unregistered tag (strip prefix, title-case).
`build_cell`(→ dict \| None)	Build one table cell for (benchmark, metric), or None when absent.
`cell_verdict`(→ str)	4-way display verdict for a cell.
`fmt_change`(→ str)	Signed percent label for a Δ (empty when not computable).
`fmt_value`(→ str)	Compact human label for a scalar value (`—` when missing).
`metric_columns`(→ list[str])	Union of metric names, ordered by (shared-by-most, first-seen).
`unify_metric_names`(→ tuple[dict[str, dict], dict[str, ...)	Apply aliases to one benchmark's metrics + regressions.
`generate_scorecard`(→ pathlib.Path)	Render the scorecard for all summaries under reports_dir.

Package Contents

bencher.scorecard.DEFAULT_OTHER_CATEGORY = 'Other'

class bencher.scorecard.Chrome

Optional page header content (title, provenance, and CI nav links).

Every field is optional; each nav link renders only when supplied, so the default template carries CI-flavored links harmlessly for callers that leave them blank.

title: str = 'Benchmark Health Scorecard'

commit_sha: str = ''

branch: str = ''

pr_number: str = ''

run_url: str = ''

repo_url: str = ''

nightly_url: str = ''

main_url: str = ''

stable_url: str = ''

class bencher.scorecard.ReportLayout

Where per-benchmark artifacts live under the reports directory.

root is the subdirectory holding one folder per benchmark tag ("" means the reports directory itself). link_pattern builds the relative href to a benchmark’s HTML report; {root}, {tag} and {bench_name} are substituted.

root: str = ''

link_pattern: str = '{root}/{tag}/{bench_name}.html'

link(tag: str, bench_name: str) → str

class bencher.scorecard.ScorecardConfig

Project-specific inputs to the scorecard renderer.

Parameters:

registry – tag -> (category, display_name, description) for known benchmarks. Unregistered tags fall back to an auto-generated name in other_category.
aliases – raw_metric_name -> canonical_name so equivalent metrics from different benchmarks share one column.
percent_metrics – metric names whose value is a 0..1 fraction to be rendered as a percentage rather than a bare number.
layout – on-disk report layout (see ReportLayout).
other_category – fallback category for unregistered tags.

registry: Mapping[str, tuple[str, str, str]]

aliases: Mapping[str, str]

percent_metrics: frozenset[str]

layout: ReportLayout

other_category: str = 'Other'

category_order() → list[str]: Category display order: first-appearance in the registry, Other last.

bencher.scorecard.discover_report_links(reports_dir: pathlib.Path, config: bencher.scorecard.config.ScorecardConfig, exclude_tags: set[str]) → list[dict]

Benchmarks with an HTML report but no scalar metrics, grouped by category.

The scorecard charts only benchmarks that emit scalar metrics; image-only reports and any report whose summary is missing would otherwise be unreachable. Drops any tag already shown as a metric row. Returns [{category, links: [{name, link}]}] in category display order.

bencher.scorecard.discover_summaries(reports_dir: pathlib.Path, config: bencher.scorecard.config.ScorecardConfig) → list[dict]

Parse every *.summary.json under the reports root.

Returns one record per summary file with registry metadata attached, in deterministic (category order, then display name) order. Benchmarks with no scalar metrics and malformed JSON are skipped.

bencher.scorecard.tag_to_name(tag: str) → str: Fallback display name for an unregistered tag (strip prefix, title-case).

bencher.scorecard.build_cell(rec: dict, var: str, config: bencher.scorecard.config.ScorecardConfig) → dict | None: Build one table cell for (benchmark, metric), or None when absent.

bencher.scorecard.cell_verdict(reg: dict | None) → str

4-way display verdict for a cell.

None — no regression gate on this metric (or too little history) — maps to the uncolored "trend" fallback. A gate on a young baseline maps there too: the baseline is younger than regression_min_history, so bencher reports the regression but never blocks on it — colouring it like a real regression would overstate a verdict its own gate treats as advisory. Otherwise defer to bencher’s 3-state core verdict and render its "unchanged" as "passed" (the gate ran and did not flag). A gate with no threshold can only have “passed”.

bencher.scorecard.fmt_change(change_percent: float | None) → str: Signed percent label for a Δ (empty when not computable).

bencher.scorecard.fmt_value(value: float | None, units: str | None, *, as_percent: bool = False) → str: Compact human label for a scalar value (— when missing).

bencher.scorecard.metric_columns(records: list[dict]) → list[str]: Union of metric names, ordered by (shared-by-most, first-seen).

bencher.scorecard.unify_metric_names(metrics: dict[str, dict], regressions: dict[str, dict], aliases: dict[str, str]) → tuple[dict[str, dict], dict[str, dict]]

Apply aliases to one benchmark’s metrics + regressions.

Returns new dicts keyed by canonical column names, preserving metric order. A renamed metric records its original name under source_variable so a cell tooltip can surface it. Collisions (the canonical name already exists on this benchmark, or two of its metrics map to the same alias) keep the raw name to never drop or shadow data.

bencher.scorecard.generate_scorecard(reports_dir: pathlib.Path | str, config: bencher.scorecard.config.ScorecardConfig | None = None, *, chrome: bencher.scorecard.config.Chrome | None = None, output_name: str = 'index.html') → pathlib.Path

Render the scorecard for all summaries under reports_dir.

Parameters:

reports_dir – Directory containing <layout.root>/<tag>/*.summary.json.
config – Project specifics (registry, aliases, layout, …). Defaults to a zero-config ScorecardConfig (auto-named benchmarks).
chrome – Optional page header / CI nav content.
output_name – File written under reports_dir (the scorecard is usually published as index.html so it is the landing page).

Returns:

The path to the written HTML file.