Copyright	(c) Marco Zocca 2026
License	MIT
Maintainer	@ocramz
Safe Haskell	None
Language	Haskell2010

Test.Hspec.BenchGolden.Runner

Contents

Running Benchmarks
Parameter Sweeps
Golden File Operations
Comparison
Robust Statistics
Environment
Benchmarkable Constructors

Description

This module handles running benchmarks and comparing results against golden files. It includes:

Benchmark execution with warm-up iterations
Golden file IO (reading and writing JSON statistics)
Tolerance-based comparison with variance warnings
Support for updating baselines via GOLDS_GYM_ACCEPT environment variable
Evaluation strategies to control how values are forced (nf variants).

Evaluation Strategies

Benchmarks require explicit evaluation strategies to prevent GHC from optimizing away computations or sharing results across iterations:

nf - Force result to normal form (deep, full evaluation)
nfIO - Execute IO and force result to normal form
nfAppIO - Apply function, execute IO, force result to normal form
io - Plain IO without additional forcing

Synopsis

runBenchGolden :: BenchGolden -> IO BenchResult
runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
runSweepPoint :: Show a => String -> BenchConfig -> Text -> a -> BenchAction -> IO (BenchResult, GoldenStats)
runSweep :: Show a => String -> BenchConfig -> Text -> [a] -> (a -> BenchAction) -> IO [(a, BenchResult, GoldenStats)]
readGoldenFile :: FilePath -> IO (Either String GoldenStats)
writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
getGoldenPath :: FilePath -> FilePath -> String -> FilePath
getActualPath :: FilePath -> FilePath -> String -> FilePath
compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult
checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning]
calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double])
calculateTrimmedMean :: Double -> Vector Double -> Double
calculateMAD :: Vector Double -> Double -> Double
calculateIQR :: Vector Double -> Double
detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double]
shouldUpdateGolden :: IO Bool
shouldSkipBenchmarks :: IO Bool
setAcceptGoldens :: Bool -> IO ()
setSkipBenchmarks :: Bool -> IO ()
io :: IO () -> BenchAction
nf :: NFData b => (a -> b) -> a -> BenchAction
nfIO :: NFData a => IO a -> BenchAction
nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction

Running Benchmarks

runBenchGolden :: BenchGolden -> IO BenchResult Source #

Run a benchmark golden test.

This function:

Runs warm-up iterations (discarded)
Runs the actual benchmark
Writes actual results to .actual file
If no golden exists, creates it (first run)
Otherwise, compares against golden with tolerance

The result includes any warnings (e.g., variance changes).

runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark and collect statistics.

Uses raw timing collection with proper inner iteration counts to ensure the SPEC trick in nf/nfIO prevents thunk sharing.

runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark with raw timing collection for robust statistics.

This function times running all iterations in a single batch, then divides to get per-iteration timing. The SPEC trick in nf/nfIO prevents sharing within the batch.

We collect multiple samples by running the full batch multiple times, ensuring accurate measurements even with GHC's -O2 optimizations.

Parameter Sweeps

runSweepPoint Source #

Arguments

:: Show a
=> String	Base sweep name
-> BenchConfig
-> Text	Parameter name
-> a	Parameter value
-> BenchAction
-> IO (BenchResult, GoldenStats)

Run a single point of a parameter sweep.

This is similar to runBenchGolden but returns the GoldenStats along with the BenchResult, allowing the caller to accumulate stats for CSV export.

Each point is saved to its own golden file with the parameter value included in the filename (e.g., sort-scaling_n=1000.golden).

runSweep Source #

Arguments

:: Show a
=> String	Sweep name
-> BenchConfig
-> Text	Parameter name (for CSV column header)
-> [a]	Parameter values to sweep over
-> (a -> BenchAction)	Action generator
-> IO [(a, BenchResult, GoldenStats)]

Run a full parameter sweep and write CSV output.

This runs benchmarks for all parameter values, saves individual golden files, and writes a single CSV file with all results for analysis.

The CSV file is placed at:

<outputDir>/<sweep-name>-<arch-id>.csv

Golden File Operations

readGoldenFile :: FilePath -> IO (Either String GoldenStats) Source #

Read a golden file.

writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write a golden file.

writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write an actual results file.

getGoldenPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for a golden file.

getActualPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for an actual results file.

Comparison

compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #

Compare actual stats against golden stats.

Returns a BenchResult indicating whether the benchmark passed, regressed, or improved, along with any warnings.

Hybrid Tolerance Strategy

The comparison uses BOTH percentage and absolute tolerance (when configured):

Calculate percentage difference: ((actual - golden) / golden) * 100
Pass if abs(percentDiff) <= tolerancePercent (percentage check)
OR if abs(actual - golden) <= absoluteToleranceMs (absolute check)

This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.

checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #

Check for variance changes and generate warnings.

Robust Statistics

calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #

Calculate robust statistics from raw timing data.

Returns: (trimmed mean, MAD, IQR, outliers)

calculateTrimmedMean :: Double -> Vector Double -> Double Source #

Calculate trimmed mean by removing specified percentage from each tail.

calculateMAD :: Vector Double -> Double -> Double Source #

Calculate Median Absolute Deviation (MAD).

MAD = median(|x_i - median(x)|)

calculateIQR :: Vector Double -> Double Source #

Calculate Interquartile Range (IQR = Q3 - Q1).

detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #

Detect outliers using MAD-based threshold.

An observation is an outlier if: |x - median| > threshold * MAD

Environment

shouldUpdateGolden :: IO Bool Source #

Check if golden files should be updated.

Returns True if GOLDS_GYM_ACCEPT environment variable is set.

Usage:

GOLDS_GYM_ACCEPT=1 cabal test
GOLDS_GYM_ACCEPT=1 stack test

shouldSkipBenchmarks :: IO Bool Source #

Check if benchmarks should be skipped entirely.

Returns True if GOLDS_GYM_SKIP environment variable is set. Useful for CI environments where benchmark hardware is inconsistent.

Usage:

GOLDS_GYM_SKIP=1 cabal test
GOLDS_GYM_SKIP=1 stack test

setAcceptGoldens :: Bool -> IO () Source #

Set the accept goldens flag (called from BenchGolden Example instance).

setSkipBenchmarks :: Bool -> IO () Source #

Set the skip benchmarks flag (called from BenchGolden Example instance).

Benchmarkable Constructors

io :: IO () -> BenchAction Source #

Benchmark an IO action, discarding the result. This is for backward compatibility with code that uses IO () actions.

Example:

benchGolden "compute" (io $ do
  result <- heavyComputation
  evaluate result)

nf :: NFData b => (a -> b) -> a -> BenchAction Source #

Benchmark a pure function applied to an argument, forcing the result to normal form (NF) using rnf from Control.DeepSeq. This ensures the entire result structure is evaluated.

Example:

benchGolden "fib 30" (nf fib 30)

nfIO :: NFData a => IO a -> BenchAction Source #

Benchmark an IO action, forcing the result to normal form.

Example:

benchGolden "readFile" (nfIO $ readFile "data.txt")

nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction Source #

Benchmark a function that performs IO, forcing the result to normal form.

Example:

benchGolden "lookup in map" (nfAppIO lookupInDB "key")