Copyright	(c) 2026
License	MIT
Maintainer	your.email@example.com
Safe Haskell	None
Language	Haskell2010

Test.Hspec.BenchGolden

Contents

Spec Combinators
Configuration
Types
Low-Level API
Re-exports
Orphan instances

Description

Overview

golds-gym is a framework for golden testing of performance benchmarks. It integrates with hspec and uses benchpress for lightweight timing measurements.

Optionally, benchmarks can use robust statistics to mitigate the impact of outliers.

Quick Start

import Test.Hspec
import Test.Hspec.BenchGolden

main :: IO ()
main = hspec $ do
  describe "Performance" $ do
    benchGolden "my algorithm" $
      return $ myAlgorithm input

How It Works

On first run, the benchmark is executed and results are saved to a golden file as the baseline.

On subsequent runs, the benchmark is executed and compared against the baseline using a configurable tolerance (default ±15%).
If the mean time exceeds the tolerance, the test fails with a regression or improvement notification.

Architecture-Specific Baselines

Golden files are stored per-architecture to ensure benchmarks are only compared against equivalent hardware. The architecture identifier includes CPU type, OS, and CPU model.

Configuration

Use benchGoldenWith with a custom BenchConfig to adjust:

Number of iterations
Warm-up iterations
Tolerance percentage
Absolute tolerance (hybrid tolerance strategy)
Variance warnings
Robust statistics mode (trimmed mean, MAD, outlier detection)

Tolerance Configuration

The framework supports two tolerance mechanisms that work together:

Percentage tolerance (tolerancePercent): Checks if the mean time change is within ±X% of the baseline. This is the traditional approach and works well for operations that take more than a few milliseconds.
Absolute tolerance (absoluteToleranceMs): Checks if the absolute time difference is within X milliseconds. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute differences.

By default, benchmarks pass if EITHER tolerance is satisfied:

pass = (percentChange <= 15%) OR (absTimeDiff <= 0.01 ms)

This hybrid strategy combines the benefits of both approaches:

For fast operations (< 1ms): Absolute tolerance dominates, preventing noise
For slow operations (> 1ms): Percentage tolerance dominates, catching real regressions

To disable absolute tolerance and use percentage-only comparison:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Nothing
  }
  "benchmark" $ ...

To adjust the absolute tolerance threshold:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Just 0.001  -- 1 microsecond (very strict)
  }
  "benchmark" $ ...

Environment Variables

GOLDS_GYM_ACCEPT=1 - Regenerate all golden files
GOLDS_GYM_SKIP=1 - Skip all benchmark tests
GOLDS_GYM_ARCH=custom-id - Override architecture detection

Synopsis

benchGolden :: String -> IO () -> Spec
benchGoldenWith :: BenchConfig -> String -> IO () -> Spec
benchGoldenIO :: String -> IO () -> Spec
benchGoldenIOWith :: BenchConfig -> String -> IO () -> Spec
data BenchConfig = BenchConfig {
- iterations :: !Int
- warmupIterations :: !Int
- tolerancePercent :: !Double
- absoluteToleranceMs :: !(Maybe Double)
- warnOnVarianceChange :: !Bool
- varianceTolerancePercent :: !Double
- outputDir :: !FilePath
- failOnFirstRun :: !Bool
- useRobustStatistics :: !Bool
- trimPercent :: !Double
- outlierThreshold :: !Double
}
defaultBenchConfig :: BenchConfig
data BenchGolden = BenchGolden {
- benchName :: !String
- benchAction :: !(IO ())
- benchConfig :: !BenchConfig
}
data GoldenStats = GoldenStats {
- statsMean :: !Double
- statsStddev :: !Double
- statsMedian :: !Double
- statsMin :: !Double
- statsMax :: !Double
- statsPercentiles :: ![(Int, Double)]
- statsArch :: !Text
- statsTimestamp :: !UTCTime
- statsTrimmedMean :: !Double
- statsMAD :: !Double
- statsIQR :: !Double
- statsOutliers :: ![Double]
}
data BenchResult
- = FirstRun !GoldenStats
- | Pass !GoldenStats !GoldenStats ![Warning]
- | Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- | Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
data Warning
- = VarianceIncreased !Double !Double !Double !Double
- | VarianceDecreased !Double !Double !Double !Double
- | HighVariance !Double
- | OutliersDetected !Int ![Double]
data ArchConfig = ArchConfig {
- archId :: !Text
- archOS :: !Text
- archCPU :: !Text
- archModel :: !(Maybe Text)
}
runBenchGolden :: BenchGolden -> IO BenchResult
module Test.Hspec.BenchGolden.Arch

Spec Combinators

benchGolden Source #

Arguments

:: String	Name of the benchmark
-> IO ()	The IO action to benchmark
-> Spec

Create a benchmark golden test with default configuration.

This is the simplest way to add a benchmark test:

describe Sorting $ do
  benchGolden "quicksort 1000 elements" $
    return $ quicksort [1000, 999..1]

Default configuration:

100 iterations
5 warm-up iterations
15% tolerance
Variance warnings enabled
Standard statistics (not robust mode)

benchGoldenWith Source #

Arguments

:: BenchConfig	Configuration parameters
-> String	Name of the benchmark
-> IO ()	The IO action to benchmark
-> Spec

Create a benchmark golden test with custom configuration.

Examples:

-- Tighter tolerance for critical code
benchGoldenWith defaultBenchConfig
  { iterations = 500
  , tolerancePercent = 5.0
  , warmupIterations = 20
  }
  "hot loop" $
  return $ criticalFunction input

-- Robust statistics mode for noisy environments
benchGoldenWith defaultBenchConfig
  { useRobustStatistics = True
  , trimPercent = 10.0
  , outlierThreshold = 3.0
  }
  "benchmark with outliers" $
  return $ computation input

benchGoldenIO Source #

Arguments

:: String	Name of the benchmark
-> IO ()	The IO action to benchmark
-> Spec

Create a benchmark golden test for an IO action.

This is an alias for benchGolden that makes it clear the action involves IO (e.g., file operations, network calls).

benchGoldenIO "file read" $ do
  contents <- readFile "large-file.txt"
  evaluate (length contents)

Note: For IO actions in noisy environments (CI, shared systems), consider using benchGoldenIOWith with useRobustStatistics = True.

benchGoldenIOWith Source #

Arguments

:: BenchConfig	Configuration parameters
-> String	Name of the benchmark
-> IO ()	The IO action to benchmark
-> Spec

Create an IO benchmark golden test with custom configuration.

Configuration

data BenchConfig Source #

Configurable parameters for benchmark execution and comparison.

Constructors

BenchConfig

Fields

iterations :: !Int
Number of benchmark iterations to run
warmupIterations :: !Int
Number of warm-up iterations (discarded before measurement)
tolerancePercent :: !Double
Allowed deviation in mean time (as percentage, e.g., 15.0 = ±15%)
absoluteToleranceMs :: !(Maybe Double)
Minimum absolute tolerance in milliseconds (e.g., 0.01 = 10 microseconds). When set, benchmarks pass if EITHER the percentage difference is within tolerancePercent OR the absolute time difference is within this threshold. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute time differences. Set to Nothing to disable (percentage-only).
warnOnVarianceChange :: !Bool
Whether to emit warnings when stddev changes significantly
varianceTolerancePercent :: !Double
Allowed deviation in stddev (as percentage)
outputDir :: !FilePath
Directory for storing golden files
failOnFirstRun :: !Bool
Whether to fail if no golden file exists yet
useRobustStatistics :: !Bool
Use robust statistics (trimmed mean, MAD) instead of mean/stddev
trimPercent :: !Double
Percentage to trim from each tail for trimmed mean (e.g., 10.0 = 10%)
outlierThreshold :: !Double
MAD multiplier for outlier detection (e.g., 3.0 = 3 MADs from median)

Instances

Instances details

Generic BenchConfig Source #

Instance details

BenchGolden
Fields benchName :: !String Name of the benchmark (used for golden file naming) benchAction :: !(IO ()) The IO action to benchmark benchConfig :: !BenchConfig Configuration parameters

FirstRun !GoldenStats	No golden file existed; baseline created
Pass !GoldenStats !GoldenStats ![Warning]	Benchmark passed (golden stats, actual stats, warnings)
Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)	Performance regression (golden, actual, percent change, tolerance, absolute tolerance)
Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)	Performance improvement (golden, actual, percent change, tolerance, absolute tolerance)

Show BenchResult Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> BenchResult -> ShowS # show :: BenchResult -> String # showList :: [BenchResult] -> ShowS #
Eq BenchResult Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods (==) :: BenchResult -> BenchResult -> Bool # (/=) :: BenchResult -> BenchResult -> Bool #

VarianceIncreased !Double !Double !Double !Double	Stddev increased (golden, actual, percent change, tolerance)
VarianceDecreased !Double !Double !Double !Double	Stddev decreased significantly (golden, actual, percent change, tolerance)
HighVariance !Double	Current run has unusually high variance
OutliersDetected !Int ![Double]	Outliers detected (count, list of outlier timings)

Show Warning Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> Warning -> ShowS # show :: Warning -> String # showList :: [Warning] -> ShowS #
Eq Warning Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods (==) :: Warning -> Warning -> Bool # (/=) :: Warning -> Warning -> Bool #

ArchConfig
Fields archId :: !Text Unique identifier (e.g., "aarch64-darwin-Apple_M1") archOS :: !Text Operating system (e.g., "darwin", "linux") archCPU :: !Text CPU architecture (e.g., "aarch64", "x86_64") archModel :: !(Maybe Text) CPU model if available (e.g., "Apple M1", "Intel Core i7")

Overview

Quick Start

How It Works

Architecture-Specific Baselines

Configuration

Tolerance Configuration

Environment Variables

Spec Combinators

Configuration

Instances

Hybrid Tolerance Strategy

Types

Instances

Instances

Instances

Instances

Instances

Low-Level API

Re-exports

Orphan instances