golds-gym-0.7.0.0: Golden testing framework for performance benchmarks
Copyright(c) Marco Zocca 2026
LicenseMIT
Maintainer@ocramz
Safe HaskellNone
LanguageHaskell2010

Test.Hspec.BenchGolden.Runner

Description

This module handles running benchmarks and comparing results against golden files. It includes:

  • Benchmark execution with warm-up iterations
  • Golden file IO (reading and writing JSON statistics)
  • Tolerance-based comparison with variance warnings
  • Support for updating baselines via GOLDS_GYM_ACCEPT environment variable
  • Evaluation strategies to control how values are forced (nf variants).

Evaluation Strategies

Benchmarks require explicit evaluation strategies to prevent GHC from optimizing away computations or sharing results across iterations:

  • nf - Force result to normal form (deep, full evaluation)
  • nfIO - Execute IO and force result to normal form
  • nfAppIO - Apply function, execute IO, force result to normal form
  • io - Plain IO without additional forcing

These are vendored from tasty-bench under the MIT license, (c) 2021 Andrew Lelechenko.

Synopsis

Running Benchmarks

runBenchGolden :: BenchGolden -> IO BenchResult Source #

Run a benchmark golden test.

This function:

  1. Runs warm-up iterations (discarded)
  2. Runs the actual benchmark
  3. Writes actual results to .actual file
  4. If no golden exists, creates it (first run)
  5. Otherwise, compares against golden with tolerance

The result includes any warnings (e.g., variance changes).

runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark and collect statistics.

Uses raw timing collection with proper inner iteration counts to ensure the SPEC trick in nf/nfIO prevents thunk sharing.

runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark with raw timing collection for robust statistics.

This function times running all iterations in a single batch, then divides to get per-iteration timing. The SPEC trick in nf/nfIO prevents sharing within the batch.

We collect multiple samples by running the full batch multiple times, ensuring accurate measurements even with GHC's -O2 optimizations.

Parameter Sweeps

runSweepPoint Source #

Arguments

:: Show a 
=> String

Base sweep name

-> BenchConfig 
-> Text

Parameter name

-> a

Parameter value

-> BenchAction 
-> IO (BenchResult, GoldenStats) 

Run a single point of a parameter sweep.

This is similar to runBenchGolden but returns the GoldenStats along with the BenchResult, allowing the caller to accumulate stats for CSV export.

Each point is saved to its own golden file with the parameter value included in the filename (e.g., sort-scaling_n=1000.golden).

runSweep Source #

Arguments

:: Show a 
=> String

Sweep name

-> BenchConfig 
-> Text

Parameter name (for CSV column header)

-> [a]

Parameter values to sweep over

-> (a -> BenchAction)

Action generator

-> IO [(a, BenchResult, GoldenStats)] 

Run a full parameter sweep and write CSV output.

This runs benchmarks for all parameter values, saves individual golden files, and writes a single CSV file with all results for analysis.

The CSV file is placed at:

<outputDir>/<sweep-name>-<arch-id>.csv

Golden File Operations

writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write a golden file.

writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write an actual results file.

getGoldenPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for a golden file.

getActualPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for an actual results file.

Comparison

compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #

Compare actual stats against golden stats.

Returns a BenchResult indicating whether the benchmark passed, regressed, or improved, along with any warnings.

Hybrid Tolerance Strategy

The comparison uses BOTH percentage and absolute tolerance (when configured):

  1. Calculate percentage difference: ((actual - golden) / golden) * 100
  2. Pass if abs(percentDiff) <= tolerancePercent (percentage check)
  3. OR if abs(actual - golden) <= absoluteToleranceMs (absolute check)

This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.

checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #

Check for variance changes and generate warnings.

Robust Statistics

calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #

Calculate robust statistics from raw timing data.

Returns: (trimmed mean, MAD, IQR, outliers)

calculateTrimmedMean :: Double -> Vector Double -> Double Source #

Calculate trimmed mean by removing specified percentage from each tail.

calculateMAD :: Vector Double -> Double -> Double Source #

Calculate Median Absolute Deviation (MAD).

MAD = median(|x_i - median(x)|)

calculateIQR :: Vector Double -> Double Source #

Calculate Interquartile Range (IQR = Q3 - Q1).

detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #

Detect outliers using MAD-based threshold.

An observation is an outlier if: |x - median| > threshold * MAD

Environment

shouldUpdateGolden :: IO Bool Source #

Check if golden files should be updated.

Returns True if GOLDS_GYM_ACCEPT environment variable is set.

Usage:

GOLDS_GYM_ACCEPT=1 cabal test
GOLDS_GYM_ACCEPT=1 stack test

shouldSkipBenchmarks :: IO Bool Source #

Check if benchmarks should be skipped entirely.

Returns True if GOLDS_GYM_SKIP environment variable is set. Useful for CI environments where benchmark hardware is inconsistent.

Usage:

GOLDS_GYM_SKIP=1 cabal test
GOLDS_GYM_SKIP=1 stack test

setAcceptGoldens :: Bool -> IO () Source #

Set the accept goldens flag (called from BenchGolden Example instance).

setSkipBenchmarks :: Bool -> IO () Source #

Set the skip benchmarks flag (called from BenchGolden Example instance).

Benchmarkable Constructors

io :: IO () -> BenchAction Source #

Benchmark an IO action, discarding the result. This is for backward compatibility with code that uses IO () actions.

Example:

benchGolden "compute" (io $ do
  result <- heavyComputation
  evaluate result)

nf :: NFData b => (a -> b) -> a -> BenchAction Source #

Benchmark a pure function applied to an argument, forcing the result to normal form (NF) using rnf from Control.DeepSeq. This ensures the entire result structure is evaluated.

Example:

benchGolden "fib 30" (nf fib 30)

nfIO :: NFData a => IO a -> BenchAction Source #

Benchmark an IO action, forcing the result to normal form.

Example:

benchGolden "readFile" (nfIO $ readFile "data.txt")

nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction Source #

Benchmark a function that performs IO, forcing the result to normal form.

Example:

benchGolden "lookup in map" (nfAppIO lookupInDB "key")