golds-gym-0.4.0.0: Golden testing framework for performance benchmarks
Copyright(c) 2026
LicenseMIT
Maintainer@ocramz
Safe HaskellNone
LanguageHaskell2010

Test.Hspec.BenchGolden.Runner

Description

This module handles running benchmarks and comparing results against golden files. It includes:

  • Benchmark execution with warm-up iterations
  • Golden file IO (readingwriting JSON statistics)
  • Tolerance-based comparison with variance warnings
  • Support for updating baselines via GOLDS_GYM_ACCEPT environment variable
  • Evaluation strategies to control how values are forced (nf, whnf, etc.)

Evaluation Strategies

Benchmarks require explicit evaluation strategies to prevent GHC from optimizing away computations or sharing results across iterations:

  • nf - Force result to normal form (deep, full evaluation)
  • whnf - Force result to weak head normal form (shallow evaluation)
  • nfIO - Execute IO and force result to normal form
  • whnfIO - Execute IO and force result to WHNF
  • nfAppIO - Apply function, execute IO, force result to normal form
  • whnfAppIO - Apply function, execute IO, force result to WHNF
  • io - Plain IO without additional forcing

These are vendored from tasty-bench with proper attribution (BSD-3-Clause).

Synopsis

Running Benchmarks

runBenchGolden :: BenchGolden -> IO BenchResult Source #

Run a benchmark golden test.

This function:

  1. Runs warm-up iterations (discarded)
  2. Runs the actual benchmark
  3. Writes actual results to .actual file
  4. If no golden exists, creates it (first run)
  5. Otherwise, compares against golden with tolerance

The result includes any warnings (e.g., variance changes).

runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark and collect statistics.

runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark with raw timing collection for robust statistics.

Golden File Operations

writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write a golden file.

writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write an actual results file.

getGoldenPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for a golden file.

getActualPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for an actual results file.

Comparison

compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #

Compare actual stats against golden stats.

Returns a BenchResult indicating whether the benchmark passed, regressed, or improved, along with any warnings.

Hybrid Tolerance Strategy

The comparison uses BOTH percentage and absolute tolerance (when configured):

  1. Calculate percentage difference: ((actual - golden) / golden) * 100
  2. Pass if abs(percentDiff) <= tolerancePercent (percentage check)
  3. OR if abs(actual - golden) <= absoluteToleranceMs (absolute check)

This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.

checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #

Check for variance changes and generate warnings.

Robust Statistics

calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #

Calculate robust statistics from raw timing data.

Returns: (trimmed mean, MAD, IQR, outliers)

calculateTrimmedMean :: Double -> Vector Double -> Double Source #

Calculate trimmed mean by removing specified percentage from each tail.

calculateMAD :: Vector Double -> Double -> Double Source #

Calculate Median Absolute Deviation (MAD).

MAD = median(|x_i - median(x)|)

calculateIQR :: Vector Double -> Double Source #

Calculate Interquartile Range (IQR = Q3 - Q1).

detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #

Detect outliers using MAD-based threshold.

An observation is an outlier if: |x - median| > threshold * MAD

Environment

shouldUpdateGolden :: IO Bool Source #

Check if golden files should be updated.

Returns True if GOLDS_GYM_ACCEPT environment variable is set.

Usage:

GOLDS_GYM_ACCEPT=1 cabal test
GOLDS_GYM_ACCEPT=1 stack test

shouldSkipBenchmarks :: IO Bool Source #

Check if benchmarks should be skipped entirely.

Returns True if GOLDS_GYM_SKIP environment variable is set. Useful for CI environments where benchmark hardware is inconsistent.

Usage:

GOLDS_GYM_SKIP=1 cabal test
GOLDS_GYM_SKIP=1 stack test

setAcceptGoldens :: Bool -> IO () Source #

Set the accept goldens flag (called from BenchGolden Example instance).

setSkipBenchmarks :: Bool -> IO () Source #

Set the skip benchmarks flag (called from BenchGolden Example instance).

Benchmarkable Constructors

io :: IO () -> BenchAction Source #

Benchmark an IO action, discarding the result. This is for backward compatibility with code that uses IO () actions.

Example: benchGolden "compute" (io $ do result <- heavyComputation evaluate result)

nf :: NFData b => (a -> b) -> a -> BenchAction Source #

Benchmark a pure function applied to an argument, forcing the result to normal form (NF) using rnf from Control.DeepSeq. This ensures the entire result structure is evaluated.

Example: benchGolden "fib 30" (nf fib 30)

whnf :: (a -> b) -> a -> BenchAction Source #

Benchmark a pure function applied to an argument, forcing the result to weak head normal form (WHNF) only. This evaluates just the outermost constructor.

Example: benchGolden "replicate" (whnf (replicate 1000) 42)

nfIO :: NFData a => IO a -> BenchAction Source #

Benchmark an IO action, forcing the result to normal form.

Example: benchGolden "readFile" (nfIO $ readFile "data.txt")

whnfIO :: IO a -> BenchAction Source #

Benchmark an IO action, forcing the result to weak head normal form.

Example: benchGolden "getLine" (whnfIO getLine)

nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction Source #

Benchmark a function that performs IO, forcing the result to normal form.

Example: benchGolden "lookup in map" (nfAppIO lookupInDB "key")

whnfAppIO :: (a -> IO b) -> a -> BenchAction Source #

Benchmark a function that performs IO, forcing the result to weak head normal form.

Example: benchGolden "query database" (whnfAppIO queryDB params)