| Copyright | (c) 2026 |
|---|---|
| License | MIT |
| Maintainer | @ocramz |
| Safe Haskell | None |
| Language | Haskell2010 |
Test.Hspec.BenchGolden.Runner
Description
This module handles running benchmarks and comparing results against golden files. It includes:
- Benchmark execution with warm-up iterations
- Golden file IO (readingwriting JSON statistics)
- Tolerance-based comparison with variance warnings
- Support for updating baselines via GOLDS_GYM_ACCEPT environment variable
- Evaluation strategies to control how values are forced (nf, whnf, etc.)
Evaluation Strategies
Benchmarks require explicit evaluation strategies to prevent GHC from optimizing away computations or sharing results across iterations:
nf- Force result to normal form (deep, full evaluation)whnf- Force result to weak head normal form (shallow evaluation)nfIO- Execute IO and force result to normal formwhnfIO- Execute IO and force result to WHNFnfAppIO- Apply function, execute IO, force result to normal formwhnfAppIO- Apply function, execute IO, force result to WHNFio- Plain IO without additional forcing
These are vendored from tasty-bench with proper attribution (BSD-3-Clause).
Synopsis
- runBenchGolden :: BenchGolden -> IO BenchResult
- runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- readGoldenFile :: FilePath -> IO (Either String GoldenStats)
- writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- getGoldenPath :: FilePath -> FilePath -> String -> FilePath
- getActualPath :: FilePath -> FilePath -> String -> FilePath
- compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult
- checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning]
- calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double])
- calculateTrimmedMean :: Double -> Vector Double -> Double
- calculateMAD :: Vector Double -> Double -> Double
- calculateIQR :: Vector Double -> Double
- detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double]
- shouldUpdateGolden :: IO Bool
- shouldSkipBenchmarks :: IO Bool
- setAcceptGoldens :: Bool -> IO ()
- setSkipBenchmarks :: Bool -> IO ()
- io :: IO () -> BenchAction
- nf :: NFData b => (a -> b) -> a -> BenchAction
- whnf :: (a -> b) -> a -> BenchAction
- nfIO :: NFData a => IO a -> BenchAction
- whnfIO :: IO a -> BenchAction
- nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction
- whnfAppIO :: (a -> IO b) -> a -> BenchAction
Running Benchmarks
runBenchGolden :: BenchGolden -> IO BenchResult Source #
Run a benchmark golden test.
This function:
- Runs warm-up iterations (discarded)
- Runs the actual benchmark
- Writes actual results to
.actualfile - If no golden exists, creates it (first run)
- Otherwise, compares against golden with tolerance
The result includes any warnings (e.g., variance changes).
runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark and collect statistics.
runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark with raw timing collection for robust statistics.
Golden File Operations
readGoldenFile :: FilePath -> IO (Either String GoldenStats) Source #
Read a golden file.
writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write a golden file.
writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write an actual results file.
getActualPath :: FilePath -> FilePath -> String -> FilePath Source #
Get the path for an actual results file.
Comparison
compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #
Compare actual stats against golden stats.
Returns a BenchResult indicating whether the benchmark passed,
regressed, or improved, along with any warnings.
Hybrid Tolerance Strategy
The comparison uses BOTH percentage and absolute tolerance (when configured):
- Calculate percentage difference:
((actual - golden) / golden) * 100 - Pass if
abs(percentDiff) <= tolerancePercent(percentage check) - OR if
abs(actual - golden) <= absoluteToleranceMs(absolute check)
This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.
checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #
Check for variance changes and generate warnings.
Robust Statistics
calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #
Calculate robust statistics from raw timing data.
Returns: (trimmed mean, MAD, IQR, outliers)
calculateTrimmedMean :: Double -> Vector Double -> Double Source #
Calculate trimmed mean by removing specified percentage from each tail.
calculateMAD :: Vector Double -> Double -> Double Source #
Calculate Median Absolute Deviation (MAD).
MAD = median(|x_i - median(x)|)
detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #
Detect outliers using MAD-based threshold.
An observation is an outlier if: |x - median| > threshold * MAD
Environment
shouldUpdateGolden :: IO Bool Source #
Check if golden files should be updated.
Returns True if GOLDS_GYM_ACCEPT environment variable is set.
Usage:
GOLDS_GYM_ACCEPT=1 cabal test GOLDS_GYM_ACCEPT=1 stack test
shouldSkipBenchmarks :: IO Bool Source #
Check if benchmarks should be skipped entirely.
Returns True if GOLDS_GYM_SKIP environment variable is set.
Useful for CI environments where benchmark hardware is inconsistent.
Usage:
GOLDS_GYM_SKIP=1 cabal test GOLDS_GYM_SKIP=1 stack test
setAcceptGoldens :: Bool -> IO () Source #
Set the accept goldens flag (called from BenchGolden Example instance).
setSkipBenchmarks :: Bool -> IO () Source #
Set the skip benchmarks flag (called from BenchGolden Example instance).
Benchmarkable Constructors
io :: IO () -> BenchAction Source #
Benchmark an IO action, discarding the result.
This is for backward compatibility with code that uses IO () actions.
Example:
benchGolden "compute" (io $ do
result <- heavyComputation
evaluate result)
nf :: NFData b => (a -> b) -> a -> BenchAction Source #
Benchmark a pure function applied to an argument, forcing the result to
normal form (NF) using rnf from Control.DeepSeq.
This ensures the entire result structure is evaluated.
Example:
benchGolden "fib 30" (nf fib 30)
whnf :: (a -> b) -> a -> BenchAction Source #
Benchmark a pure function applied to an argument, forcing the result to weak head normal form (WHNF) only. This evaluates just the outermost constructor.
Example:
benchGolden "replicate" (whnf (replicate 1000) 42)
nfIO :: NFData a => IO a -> BenchAction Source #
Benchmark an IO action, forcing the result to normal form.
Example:
benchGolden "readFile" (nfIO $ readFile "data.txt")
whnfIO :: IO a -> BenchAction Source #
Benchmark an IO action, forcing the result to weak head normal form.
Example:
benchGolden "getLine" (whnfIO getLine)