| Copyright | (c) Marco Zocca 2026 |
|---|---|
| License | MIT |
| Maintainer | @ocramz |
| Safe Haskell | None |
| Language | Haskell2010 |
Test.Hspec.BenchGolden.Runner
Description
This module handles running benchmarks and comparing results against golden files. It includes:
- Benchmark execution with warm-up iterations
- Golden file IO (reading and writing JSON statistics)
- Tolerance-based comparison with variance warnings
- Support for updating baselines via GOLDS_GYM_ACCEPT environment variable
- Evaluation strategies to control how values are forced (nf variants).
Evaluation Strategies
Benchmarks require explicit evaluation strategies to prevent GHC from optimizing away computations or sharing results across iterations:
nf- Force result to normal form (deep, full evaluation)nfIO- Execute IO and force result to normal formnfAppIO- Apply function, execute IO, force result to normal formio- Plain IO without additional forcing
These are vendored from tasty-bench under the MIT license, (c) 2021 Andrew Lelechenko.
Synopsis
- runBenchGolden :: BenchGolden -> IO BenchResult
- runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- runSweepPoint :: Show a => String -> BenchConfig -> Text -> a -> BenchAction -> IO (BenchResult, GoldenStats)
- runSweep :: Show a => String -> BenchConfig -> Text -> [a] -> (a -> BenchAction) -> IO [(a, BenchResult, GoldenStats)]
- readGoldenFile :: FilePath -> IO (Either String GoldenStats)
- writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- getGoldenPath :: FilePath -> FilePath -> String -> FilePath
- getActualPath :: FilePath -> FilePath -> String -> FilePath
- compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult
- checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning]
- calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double])
- calculateTrimmedMean :: Double -> Vector Double -> Double
- calculateMAD :: Vector Double -> Double -> Double
- calculateIQR :: Vector Double -> Double
- detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double]
- shouldUpdateGolden :: IO Bool
- shouldSkipBenchmarks :: IO Bool
- setAcceptGoldens :: Bool -> IO ()
- setSkipBenchmarks :: Bool -> IO ()
- io :: IO () -> BenchAction
- nf :: NFData b => (a -> b) -> a -> BenchAction
- nfIO :: NFData a => IO a -> BenchAction
- nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction
Running Benchmarks
runBenchGolden :: BenchGolden -> IO BenchResult Source #
Run a benchmark golden test.
This function:
- Runs warm-up iterations (discarded)
- Runs the actual benchmark
- Writes actual results to
.actualfile - If no golden exists, creates it (first run)
- Otherwise, compares against golden with tolerance
The result includes any warnings (e.g., variance changes).
runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark and collect statistics.
Uses raw timing collection with proper inner iteration counts to ensure the SPEC trick in nf/nfIO prevents thunk sharing.
runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark with raw timing collection for robust statistics.
This function times running all iterations in a single batch, then divides to get per-iteration timing. The SPEC trick in nf/nfIO prevents sharing within the batch.
We collect multiple samples by running the full batch multiple times, ensuring accurate measurements even with GHC's -O2 optimizations.
Parameter Sweeps
Arguments
| :: Show a | |
| => String | Base sweep name |
| -> BenchConfig | |
| -> Text | Parameter name |
| -> a | Parameter value |
| -> BenchAction | |
| -> IO (BenchResult, GoldenStats) |
Run a single point of a parameter sweep.
This is similar to runBenchGolden but returns the GoldenStats along
with the BenchResult, allowing the caller to accumulate stats for CSV export.
Each point is saved to its own golden file with the parameter value
included in the filename (e.g., sort-scaling_n=1000.golden).
Arguments
| :: Show a | |
| => String | Sweep name |
| -> BenchConfig | |
| -> Text | Parameter name (for CSV column header) |
| -> [a] | Parameter values to sweep over |
| -> (a -> BenchAction) | Action generator |
| -> IO [(a, BenchResult, GoldenStats)] |
Run a full parameter sweep and write CSV output.
This runs benchmarks for all parameter values, saves individual golden files, and writes a single CSV file with all results for analysis.
The CSV file is placed at:
<outputDir>/<sweep-name>-<arch-id>.csv
Golden File Operations
readGoldenFile :: FilePath -> IO (Either String GoldenStats) Source #
Read a golden file.
writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write a golden file.
writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write an actual results file.
getActualPath :: FilePath -> FilePath -> String -> FilePath Source #
Get the path for an actual results file.
Comparison
compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #
Compare actual stats against golden stats.
Returns a BenchResult indicating whether the benchmark passed,
regressed, or improved, along with any warnings.
Hybrid Tolerance Strategy
The comparison uses BOTH percentage and absolute tolerance (when configured):
- Calculate percentage difference:
((actual - golden) / golden) * 100 - Pass if
abs(percentDiff) <= tolerancePercent(percentage check) - OR if
abs(actual - golden) <= absoluteToleranceMs(absolute check)
This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.
checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #
Check for variance changes and generate warnings.
Robust Statistics
calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #
Calculate robust statistics from raw timing data.
Returns: (trimmed mean, MAD, IQR, outliers)
calculateTrimmedMean :: Double -> Vector Double -> Double Source #
Calculate trimmed mean by removing specified percentage from each tail.
calculateMAD :: Vector Double -> Double -> Double Source #
Calculate Median Absolute Deviation (MAD).
MAD = median(|x_i - median(x)|)
detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #
Detect outliers using MAD-based threshold.
An observation is an outlier if: |x - median| > threshold * MAD
Environment
shouldUpdateGolden :: IO Bool Source #
Check if golden files should be updated.
Returns True if GOLDS_GYM_ACCEPT environment variable is set.
Usage:
GOLDS_GYM_ACCEPT=1 cabal test GOLDS_GYM_ACCEPT=1 stack test
shouldSkipBenchmarks :: IO Bool Source #
Check if benchmarks should be skipped entirely.
Returns True if GOLDS_GYM_SKIP environment variable is set.
Useful for CI environments where benchmark hardware is inconsistent.
Usage:
GOLDS_GYM_SKIP=1 cabal test GOLDS_GYM_SKIP=1 stack test
setAcceptGoldens :: Bool -> IO () Source #
Set the accept goldens flag (called from BenchGolden Example instance).
setSkipBenchmarks :: Bool -> IO () Source #
Set the skip benchmarks flag (called from BenchGolden Example instance).
Benchmarkable Constructors
io :: IO () -> BenchAction Source #
Benchmark an IO action, discarding the result.
This is for backward compatibility with code that uses IO () actions.
Example:
benchGolden "compute" (io $ do result <- heavyComputation evaluate result)
nf :: NFData b => (a -> b) -> a -> BenchAction Source #
Benchmark a pure function applied to an argument, forcing the result to
normal form (NF) using rnf from Control.DeepSeq.
This ensures the entire result structure is evaluated.
Example:
benchGolden "fib 30" (nf fib 30)