Copyright	(c) 2026
License	MIT
Maintainer	@ocramz
Safe Haskell	None
Language	Haskell2010

Test.Hspec.BenchGolden

Contents

Spec Combinators
Parameter Sweeps
Configuration
Types
Benchmarkable Constructors
Low-Level API
Standalone Runner API
Comparison Utilities
Robust Statistics
Golden File I/O
Lens-Based Expectations
Re-exports
Orphan instances

Description

Overview

golds-gym is a framework for golden testing of performance benchmarks. It integrates with hspec and uses CPU time measurements for benchmarking.

Benchmarks can use robust statistics to mitigate the impact of outliers.

The library can be used both to assert that performance does not regress, and to set expectations for improvements across project versions (see benchGoldenWithExpectation).

There are also combinators for parameter sweep benchmarks that generate CSV files for analysis and plotting, see benchGoldenSweep and benchGoldenSweepWith.

Quick Start

import Test.Hspec
import Test.Hspec.BenchGolden
import Data.List (sort)

main :: IO ()
main = hspec $ do
  describe "Performance" $ do
    -- Pure function with normal form evaluation
    benchGolden "list sorting" $
      nf (\n -> sort [n, n-1 .. 1]) 1000

    -- IO action with result forced to normal form
    benchGolden "file read" $
      nfIO (readFile "data.txt")

Evaluation strategies control how values are forced:

nf - Force to normal form (deep evaluation, use for most cases)
nfIO - Variant for IO actions
nfAppIO - For functions returning IO
io - Plain IO action without forcing

Without proper evaluation strategies, GHC may optimize away computations or share results across iterations, making benchmarks meaningless.

Best Practices: Avoiding Shared Thunks

CRITICAL: When benchmarking with data structures, ensure the data is reconstructed on each iteration to avoid measuring shared, cached results.

❌ Anti-pattern (shared list across iterations):

benchGolden "sum" $ nf sum [1..1000000]

The list [1..1000000] is constructed once and shared across all iterations. This allocates the entire list in memory, creates GC pressure, and prevents list fusion. The first iteration evaluates the shared thunk, and subsequent iterations measure cached results.

✅ Correct pattern (list reconstructed per iteration):

benchGolden "sum" $ nf (\n -> sum [1..n]) 1000000

The lambda wrapper ensures the list is reconstructed on every iteration, measuring the true cost of both construction and computation.

Other considerations:

Ensure return types are inhabited enough to depend on all computations (avoid b ~ () where GHC might optimize away the payload)
For inlinable functions, ensure full saturation: prefer nf (\n -> f n) x over nf f x to guarantee inlining and rewrite rules fire
Use NFData constraints where applicable to ensure deep evaluation

How It Works

On first run, the benchmark is executed and results are saved to a golden file as the baseline.

On subsequent runs, the benchmark is executed and compared against the baseline using a configurable tolerance or expectation combinators.

Architecture-Specific Baselines

Golden files are stored per-architecture to ensure benchmarks are only compared against equivalent hardware. The architecture identifier includes CPU type, OS, and CPU model.

Configuration

Use benchGoldenWith or benchGoldenWithExpectation with a custom BenchConfig:

Tolerance Configuration

The framework supports two tolerance mechanisms that work together:

Percentage tolerance (tolerancePercent): Checks if the mean time change is within ±X% of the baseline. This is the traditional approach and works well for operations that take more than a few milliseconds.
Absolute tolerance (absoluteToleranceMs): Checks if the absolute time difference is within X milliseconds. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute differences.

By default, benchmarks pass if EITHER tolerance is satisfied:

pass = (percentChange <= 15%) OR (absTimeDiff <= 0.01 ms)

This hybrid strategy combines the benefits of both approaches:

For fast operations (< 1ms): Absolute tolerance dominates, preventing noise
For slow operations (> 1ms): Percentage tolerance dominates, catching real regressions

To disable absolute tolerance and use percentage-only comparison:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Nothing
  }
  "benchmark" $ ...

To adjust the absolute tolerance threshold:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Just 0.001  -- 1 microsecond (very strict)
  }
  "benchmark" $ ...

Synopsis

benchGolden :: String -> BenchAction -> Spec
benchGoldenWith :: BenchConfig -> String -> BenchAction -> Spec
benchGoldenWithExpectation :: String -> BenchConfig -> [Expectation] -> BenchAction -> Spec
benchGoldenSweep :: Show a => String -> Text -> [a] -> (a -> BenchAction) -> Spec
benchGoldenSweepWith :: Show a => BenchConfig -> String -> Text -> [a] -> (a -> BenchAction) -> Spec
data BenchConfig = BenchConfig {
- iterations :: !Int
- warmupIterations :: !Int
- tolerancePercent :: !Double
- absoluteToleranceMs :: !(Maybe Double)
- warnOnVarianceChange :: !Bool
- varianceTolerancePercent :: !Double
- outputDir :: !FilePath
- failOnFirstRun :: !Bool
- useRobustStatistics :: !Bool
- trimPercent :: !Double
- outlierThreshold :: !Double
}
defaultBenchConfig :: BenchConfig
data BenchGolden = BenchGolden {
- benchName :: !String
- benchAction :: !BenchAction
- benchConfig :: !BenchConfig
}
newtype BenchAction = BenchAction {
- runBenchAction :: Word64 -> IO ()
}
data GoldenStats = GoldenStats {
- statsMean :: !Double
- statsStddev :: !Double
- statsMedian :: !Double
- statsMin :: !Double
- statsMax :: !Double
- statsPercentiles :: ![(Int, Double)]
- statsArch :: !Text
- statsTimestamp :: !UTCTime
- statsTrimmedMean :: !Double
- statsMAD :: !Double
- statsIQR :: !Double
- statsOutliers :: ![Double]
}
data BenchResult
- = FirstRun !GoldenStats
- | Pass !GoldenStats !GoldenStats ![Warning]
- | Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- | Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
data Warning
- = VarianceIncreased !Double !Double !Double !Double
- | VarianceDecreased !Double !Double !Double !Double
- | HighVariance !Double
- | OutliersDetected !Int ![Double]
data ArchConfig = ArchConfig {
- archId :: !Text
- archOS :: !Text
- archCPU :: !Text
- archModel :: !(Maybe Text)
}
nf :: NFData b => (a -> b) -> a -> BenchAction
nfIO :: NFData a => IO a -> BenchAction
nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction
io :: IO () -> BenchAction
runBenchGolden :: BenchGolden -> IO BenchResult
runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
runSweep :: Show a => String -> BenchConfig -> Text -> [a] -> (a -> BenchAction) -> IO [(a, BenchResult, GoldenStats)]
runSweepPoint :: Show a => String -> BenchConfig -> Text -> a -> BenchAction -> IO (BenchResult, GoldenStats)
compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult
checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning]
calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double])
calculateTrimmedMean :: Double -> Vector Double -> Double
calculateMAD :: Vector Double -> Double -> Double
calculateIQR :: Vector Double -> Double
detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double]
readGoldenFile :: FilePath -> IO (Either String GoldenStats)
writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
getGoldenPath :: FilePath -> FilePath -> String -> FilePath
getActualPath :: FilePath -> FilePath -> String -> FilePath
expect :: Lens' GoldenStats Double -> Tolerance -> Expectation
pattern And :: !Expectation -> !Expectation -> Expectation
pattern ExpectStat :: !(Lens' GoldenStats Double) -> !Tolerance -> Expectation
pattern Or :: !Expectation -> !Expectation -> Expectation
data Tolerance
- = Percent !Double
- | Absolute !Double
- | Hybrid !Double !Double
- | MustImprove !Double
- | MustRegress !Double
metricFor :: BenchConfig -> Lens' GoldenStats Double
varianceFor :: BenchConfig -> Lens' GoldenStats Double
_statsMean :: Lens' GoldenStats Double
_statsStddev :: Lens' GoldenStats Double
_statsMedian :: Lens' GoldenStats Double
_statsMin :: Lens' GoldenStats Double
_statsMax :: Lens' GoldenStats Double
_statsTrimmedMean :: Lens' GoldenStats Double
_statsMAD :: Lens' GoldenStats Double
_statsIQR :: Lens' GoldenStats Double
expectStat :: Lens' GoldenStats Double -> Tolerance -> Expectation
checkExpectation :: Expectation -> GoldenStats -> GoldenStats -> Bool
withinPercent :: Double -> Double -> Double -> Bool
withinAbsolute :: Double -> Double -> Double -> Bool
withinHybrid :: Double -> Double -> Double -> Double -> Bool
mustImprove :: Double -> Double -> Double -> Bool
mustRegress :: Double -> Double -> Double -> Bool
(@~) :: Double -> Double -> Double -> Bool
(@<) :: Double -> Double -> Double -> Bool
(@<<) :: Double -> Double -> Double -> Bool
(@>>) :: Double -> Double -> Double -> Bool
(&&~) :: Expectation -> Expectation -> Expectation
(||~) :: Expectation -> Expectation -> Expectation
percentDiff :: Double -> Double -> Double
absDiff :: Double -> Double -> Double
toleranceFromExpectation :: Expectation -> (Double, Maybe Double)
toleranceValues :: Tolerance -> (Double, Maybe Double)
module Test.Hspec.BenchGolden.Arch
module Test.Hspec.BenchGolden.CSV

Spec Combinators

benchGolden Source #

Arguments

:: String	Name of the benchmark
-> BenchAction	The benchmarkable action
-> Spec

Create a benchmark golden test with default configuration.

This is the simplest way to add a benchmark test:

describe Sorting $ do
  benchGolden "quicksort 1000 elements" $
    nf quicksort [1000, 999..1]

Use evaluation strategy combinators to control how values are forced:

nf - Normal form (deep evaluation)
nfIO - Normal form for IO actions
nfAppIO - Normal form for functions returning IO
io - Plain IO action (for backward compatibility)

Default configuration:

100 iterations
5 warm-up iterations
15% tolerance
Variance warnings enabled
Standard statistics (not robust mode)

benchGoldenWith Source #

Arguments

:: BenchConfig	Configuration parameters
-> String	Name of the benchmark
-> BenchAction	The benchmarkable action
-> Spec

Create a benchmark golden test with custom configuration.

Examples:

-- Tighter tolerance for critical code
benchGoldenWith defaultBenchConfig
  { iterations = 500
  , tolerancePercent = 5.0
  , warmupIterations = 20
  }
  "hot loop" $
  nf criticalFunction input

-- Robust statistics mode for noisy environments
benchGoldenWith defaultBenchConfig
  { useRobustStatistics = True
  , trimPercent = 10.0
  , outlierThreshold = 3.0
  }
  "benchmark with outliers" $
  whnf computation input

benchGoldenWithExpectation Source #

Arguments

:: String	Name of the benchmark
-> BenchConfig	Configuration parameters
-> [Expectation]	List of expectations (all must pass)
-> BenchAction	The benchmarkable action
-> Spec

Create a benchmark golden test with custom lens-based expectations.

This combinator allows you to specify custom performance expectations using lenses and tolerance combinators. Expectations can be composed using boolean operators (&&~, ||~).

Examples:

-- Median-based comparison (more robust to outliers)
benchGoldenWithExpectation "median test" defaultBenchConfig
  [expect _statsMedian (Percent 10.0)]
  (nf sort [1000, 999..1])

-- Multiple metrics must pass (AND composition)
benchGoldenWithExpectation "strict test" defaultBenchConfig
  [ expect _statsMean (Percent 15.0) &&~
    expect _statsMAD (Percent 50.0)
  ]
  (nf algorithm data)

-- Either metric can pass (OR composition)
benchGoldenWithExpectation "flexible test" defaultBenchConfig
  [ expect _statsMedian (Percent 10.0) ||~
    expect _statsMin (Absolute 0.01)
  ]
  (nf fastOp input)

-- Expect performance improvement (must be faster)
benchGoldenWithExpectation "optimization" defaultBenchConfig
  [expect _statsMean (MustImprove 10.0)]  -- Must be ≥10% faster
  (nf optimizedVersion data)

-- Expect controlled regression (for feature additions)
benchGoldenWithExpectation "new feature" defaultBenchConfig
  [expect _statsMean (MustRegress 5.0)]  -- Accept 5-20% slowdown
  (nf newFeature input)

-- Low variance requirement
benchGoldenWithExpectation "stable perf" defaultBenchConfig
  [ expect _statsMean (Percent 15.0) &&~
    expect _statsIQR (Absolute 0.1)
  ]
  (nfIO stableOperation)

Note: Expectations are checked against golden files. On first run, a baseline is created. Use GOLDS_GYM_ACCEPT=1 to regenerate baselines.

Parameter Sweeps

benchGoldenSweep Source #

Arguments

:: Show a
=> String	Sweep name (used for CSV filename and golden file prefix)
-> Text	Parameter name (for CSV column header)
-> [a]	Parameter values to sweep over
-> (a -> BenchAction)	Action parameterized by sweep value
-> Spec

Create a parameter sweep benchmark with default configuration.

This combinator runs the same benchmark with multiple parameter values, saving individual golden files for each point and producing a single CSV file for analysis and plotting.

Example:

describe "Scaling Tests" $ do
  benchGoldenSweep "sort-scaling" "n" [1000, 5000, 10000, 50000] $
    \n -> nf sort [n, n-1..1]

This produces:

Golden files: .golden/<arch>/sort-scaling_n=1000.golden, etc.
CSV file: .golden/sort-scaling-<arch>.csv

benchGoldenSweepWith Source #

Arguments

:: Show a
=> BenchConfig	Configuration parameters
-> String	Sweep name
-> Text	Parameter name (for CSV column header)
-> [a]	Parameter values to sweep over
-> (a -> BenchAction)	Action parameterized by sweep value
-> Spec

Create a parameter sweep benchmark with custom configuration.

Example:

describe "Performance Scaling" $ do
  benchGoldenSweepWith
    defaultBenchConfig { iterations = 500, tolerancePercent = 10.0 }
    "algorithm-scaling" "size" [100, 500, 1000, 5000] $
    \size -> nf myAlgorithm (generateInput size)

The CSV file includes columns for timestamp, parameter value, and all standard statistics (mean, stddev, median, min, max, etc.).

Configuration

data BenchConfig Source #

Configurable parameters for benchmark execution and comparison.

Constructors

BenchConfig

Fields

iterations :: !Int
Number of benchmark iterations to run
warmupIterations :: !Int
Number of warm-up iterations (discarded before measurement)
tolerancePercent :: !Double
Allowed deviation in mean time (as percentage, e.g., 15.0 = ±15%)
absoluteToleranceMs :: !(Maybe Double)
Minimum absolute tolerance in milliseconds (e.g., 0.01 = 10 microseconds). When set, benchmarks pass if EITHER the percentage difference is within tolerancePercent OR the absolute time difference is within this threshold. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute time differences. Set to Nothing to disable (percentage-only).
warnOnVarianceChange :: !Bool
Whether to emit warnings when stddev changes significantly
varianceTolerancePercent :: !Double
Allowed deviation in stddev (as percentage)
outputDir :: !FilePath
Directory for storing golden files
failOnFirstRun :: !Bool
Whether to fail if no golden file exists yet
useRobustStatistics :: !Bool
Use robust statistics (trimmed mean, MAD) instead of mean/stddev
trimPercent :: !Double
Percentage to trim from each tail for trimmed mean (e.g., 10.0 = 10%)
outlierThreshold :: !Double
MAD multiplier for outlier detection (e.g., 3.0 = 3 MADs from median)

Instances

Instances details

Generic BenchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep BenchConfig

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

from :: BenchConfig -> Rep BenchConfig x #

to :: Rep BenchConfig x -> BenchConfig #

Show BenchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

showsPrec :: Int -> BenchConfig -> ShowS #

show :: BenchConfig -> String #

showList :: [BenchConfig] -> ShowS #

Eq BenchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

(==) :: BenchConfig -> BenchConfig -> Bool #

(/=) :: BenchConfig -> BenchConfig -> Bool #

type Rep BenchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

defaultBenchConfig :: BenchConfig Source #

Default benchmark configuration with sensible defaults.

100 iterations
5 warm-up iterations
15% tolerance on mean time
0.01 ms (10 microseconds) absolute tolerance - prevents false failures for fast operations
Variance warnings enabled at 50% tolerance
Output to .golden/ directory
Success on first run (creates baseline)

Hybrid Tolerance Strategy

The default configuration uses BOTH percentage and absolute tolerance:

Benchmarks pass if mean time is within ±15% OR within ±0.01ms
This prevents measurement noise from failing fast operations (< 1ms)
For slower operations (> 1ms), percentage tolerance dominates

Set absoluteToleranceMs = Nothing for percentage-only comparison.

Types

data BenchGolden Source #

Configuration for a single benchmark golden test.

Constructors

BenchGolden
Fields benchName :: !String Name of the benchmark (used for golden file naming) benchAction :: !BenchAction The benchmarkable action to run benchConfig :: !BenchConfig Configuration parameters

Instances

Instances details

Example BenchGolden Source #

Instance for BenchGolden without arguments.

Instance details

Defined in Test.Hspec.BenchGolden

Associated Types

type Arg BenchGolden
Instance details Defined in Test.Hspec.BenchGolden type Arg BenchGolden = ()

Methods

evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result #

Example (arg -> BenchGolden) Source #

Instance for BenchGolden with an argument.

This allows benchmarks to receive setup data from before or around combinators.

Instance details

Defined in Test.Hspec.BenchGolden

Associated Types

type Arg (arg -> BenchGolden)
Instance details Defined in Test.Hspec.BenchGolden type Arg (arg -> BenchGolden) = arg

Methods

evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result #

type Arg BenchGolden Source #

Instance details

Defined in Test.Hspec.BenchGolden

type Arg BenchGolden = ()

type Arg (arg -> BenchGolden) Source #

Instance details

Defined in Test.Hspec.BenchGolden

type Arg (arg -> BenchGolden) = arg

newtype BenchAction Source #

A benchmarkable action that can be run multiple times. The Word64 parameter represents the number of iterations to execute.

Constructors

BenchAction
Fields runBenchAction :: Word64 -> IO ()

data GoldenStats Source #

Statistics stored in golden files.

These represent the baseline performance characteristics of a benchmark on a specific architecture.

Constructors

GoldenStats

Fields

statsMean :: !Double
Mean execution time in milliseconds
statsStddev :: !Double
Standard deviation in milliseconds
statsMedian :: !Double
Median execution time in milliseconds
statsMin :: !Double
Minimum execution time in milliseconds
statsMax :: !Double
Maximum execution time in milliseconds
statsPercentiles :: ![(Int, Double)]
Percentile values (e.g., [(50, 1.2), (90, 1.5), (99, 1.8)])
statsArch :: !Text
Architecture identifier
statsTimestamp :: !UTCTime
When this baseline was recorded
statsTrimmedMean :: !Double
Trimmed mean (with tails removed) in milliseconds
statsMAD :: !Double
Median absolute deviation in milliseconds
statsIQR :: !Double
Interquartile range (Q3 - Q1) in milliseconds
statsOutliers :: ![Double]
List of detected outlier timings in milliseconds

Instances

Instances details

FromJSON GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

parseJSON :: Value -> Parser GoldenStats #

parseJSONList :: Value -> Parser [GoldenStats] #

omittedField :: Maybe GoldenStats #

ToJSON GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

toJSON :: GoldenStats -> Value #

toEncoding :: GoldenStats -> Encoding #

toJSONList :: [GoldenStats] -> Value #

toEncodingList :: [GoldenStats] -> Encoding #

omitField :: GoldenStats -> Bool #

Generic GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep GoldenStats

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

from :: GoldenStats -> Rep GoldenStats x #

to :: Rep GoldenStats x -> GoldenStats #

Show GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

showsPrec :: Int -> GoldenStats -> ShowS #

show :: GoldenStats -> String #

showList :: [GoldenStats] -> ShowS #

Eq GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

(==) :: GoldenStats -> GoldenStats -> Bool #

(/=) :: GoldenStats -> GoldenStats -> Bool #

type Rep GoldenStats Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

data BenchResult Source #

Result of running a benchmark and comparing against golden.

Constructors

FirstRun !GoldenStats	No golden file existed; baseline created
Pass !GoldenStats !GoldenStats ![Warning]	Benchmark passed (golden stats, actual stats, warnings)
Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)	Performance regression (golden, actual, percent change, tolerance, absolute tolerance)
Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)	Performance improvement (golden, actual, percent change, tolerance, absolute tolerance)

Instances

Instances details

Show BenchResult Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> BenchResult -> ShowS # show :: BenchResult -> String # showList :: [BenchResult] -> ShowS #
Eq BenchResult Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods (==) :: BenchResult -> BenchResult -> Bool # (/=) :: BenchResult -> BenchResult -> Bool #

data Warning Source #

Warnings that may be emitted during benchmark comparison.

Constructors

VarianceIncreased !Double !Double !Double !Double	Stddev increased (golden, actual, percent change, tolerance)
VarianceDecreased !Double !Double !Double !Double	Stddev decreased significantly (golden, actual, percent change, tolerance)
HighVariance !Double	Current run has unusually high variance
OutliersDetected !Int ![Double]	Outliers detected (count, list of outlier timings)

Instances

Instances details

Show Warning Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> Warning -> ShowS # show :: Warning -> String # showList :: [Warning] -> ShowS #
Eq Warning Source #
Instance details Defined in Test.Hspec.BenchGolden.Types Methods (==) :: Warning -> Warning -> Bool # (/=) :: Warning -> Warning -> Bool #

data ArchConfig Source #

Machine architecture configuration.

Used to generate unique identifiers for golden file directories, ensuring benchmarks are only compared against equivalent hardware.

Constructors

ArchConfig
Fields archId :: !Text Unique identifier (e.g., "aarch64-darwin-Apple_M1") archOS :: !Text Operating system (e.g., "darwin", "linux") archCPU :: !Text CPU architecture (e.g., "aarch64", "x86_64") archModel :: !(Maybe Text) CPU model if available (e.g., "Apple M1", "Intel Core i7")

Instances

Instances details

FromJSON ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

parseJSON :: Value -> Parser ArchConfig #

parseJSONList :: Value -> Parser [ArchConfig] #

omittedField :: Maybe ArchConfig #

ToJSON ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

toJSON :: ArchConfig -> Value #

toEncoding :: ArchConfig -> Encoding #

toJSONList :: [ArchConfig] -> Value #

toEncodingList :: [ArchConfig] -> Encoding #

omitField :: ArchConfig -> Bool #

Generic ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep ArchConfig

Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.7.0.0-5WxaqlnGMSWFcxivvKHgXD" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text)))))

Methods

from :: ArchConfig -> Rep ArchConfig x #

to :: Rep ArchConfig x -> ArchConfig #

Show ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

showsPrec :: Int -> ArchConfig -> ShowS #

show :: ArchConfig -> String #

showList :: [ArchConfig] -> ShowS #

Eq ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

(==) :: ArchConfig -> ArchConfig -> Bool #

(/=) :: ArchConfig -> ArchConfig -> Bool #

type Rep ArchConfig Source #

Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.7.0.0-5WxaqlnGMSWFcxivvKHgXD" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text)))))

Benchmarkable Constructors

nf :: NFData b => (a -> b) -> a -> BenchAction Source #

Benchmark a pure function applied to an argument, forcing the result to normal form (NF) using rnf from Control.DeepSeq. This ensures the entire result structure is evaluated.

Example:

benchGolden "fib 30" (nf fib 30)

nfIO :: NFData a => IO a -> BenchAction Source #

Benchmark an IO action, forcing the result to normal form.

Example:

benchGolden "readFile" (nfIO $ readFile "data.txt")

nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction Source #

Benchmark a function that performs IO, forcing the result to normal form.

Example:

benchGolden "lookup in map" (nfAppIO lookupInDB "key")

io :: IO () -> BenchAction Source #

Benchmark an IO action, discarding the result. This is for backward compatibility with code that uses IO () actions.

Example:

benchGolden "compute" (io $ do
  result <- heavyComputation
  evaluate result)

Low-Level API

runBenchGolden :: BenchGolden -> IO BenchResult Source #

Run a benchmark golden test.

This function:

Runs warm-up iterations (discarded)
Runs the actual benchmark
Writes actual results to .actual file
If no golden exists, creates it (first run)
Otherwise, compares against golden with tolerance

The result includes any warnings (e.g., variance changes).

Standalone Runner API

These functions can be used independently of hspec for programmatic benchmarking workflows.

runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark and collect statistics.

Uses raw timing collection with proper inner iteration counts to ensure the SPEC trick in nf/nfIO prevents thunk sharing.

runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #

Run a benchmark with raw timing collection for robust statistics.

This function times running all iterations in a single batch, then divides to get per-iteration timing. The SPEC trick in nf/nfIO prevents sharing within the batch.

We collect multiple samples by running the full batch multiple times, ensuring accurate measurements even with GHC's -O2 optimizations.

runSweep Source #

Arguments

:: Show a
=> String	Sweep name
-> BenchConfig
-> Text	Parameter name (for CSV column header)
-> [a]	Parameter values to sweep over
-> (a -> BenchAction)	Action generator
-> IO [(a, BenchResult, GoldenStats)]

Run a full parameter sweep and write CSV output.

This runs benchmarks for all parameter values, saves individual golden files, and writes a single CSV file with all results for analysis.

The CSV file is placed at:

<outputDir>/<sweep-name>-<arch-id>.csv

runSweepPoint Source #

Arguments

:: Show a
=> String	Base sweep name
-> BenchConfig
-> Text	Parameter name
-> a	Parameter value
-> BenchAction
-> IO (BenchResult, GoldenStats)

Run a single point of a parameter sweep.

This is similar to runBenchGolden but returns the GoldenStats along with the BenchResult, allowing the caller to accumulate stats for CSV export.

Each point is saved to its own golden file with the parameter value included in the filename (e.g., sort-scaling_n=1000.golden).

Comparison Utilities

compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #

Compare actual stats against golden stats.

Returns a BenchResult indicating whether the benchmark passed, regressed, or improved, along with any warnings.

Hybrid Tolerance Strategy

The comparison uses BOTH percentage and absolute tolerance (when configured):

Calculate percentage difference: ((actual - golden) / golden) * 100
Pass if abs(percentDiff) <= tolerancePercent (percentage check)
OR if abs(actual - golden) <= absoluteToleranceMs (absolute check)

This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.

checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #

Check for variance changes and generate warnings.

Robust Statistics

calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #

Calculate robust statistics from raw timing data.

Returns: (trimmed mean, MAD, IQR, outliers)

calculateTrimmedMean :: Double -> Vector Double -> Double Source #

Calculate trimmed mean by removing specified percentage from each tail.

calculateMAD :: Vector Double -> Double -> Double Source #

Calculate Median Absolute Deviation (MAD).

MAD = median(|x_i - median(x)|)

calculateIQR :: Vector Double -> Double Source #

Calculate Interquartile Range (IQR = Q3 - Q1).

detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #

Detect outliers using MAD-based threshold.

An observation is an outlier if: |x - median| > threshold * MAD

Golden File I/O

readGoldenFile :: FilePath -> IO (Either String GoldenStats) Source #

Read a golden file.

writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write a golden file.

writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #

Write an actual results file.

getGoldenPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for a golden file.

getActualPath :: FilePath -> FilePath -> String -> FilePath Source #

Get the path for an actual results file.

Lens-Based Expectations

expect :: Lens' GoldenStats Double -> Tolerance -> Expectation Source #

Create an expectation for a specific statistic field.

Example:

expect _statsMedian (Percent 10.0)
expect _statsIQR (Absolute 0.5)
expect _statsMean (Hybrid 15.0 0.01)
expect _statsMean (MustImprove 10.0)

pattern And :: !Expectation -> !Expectation -> Expectation Source #

Both expectations must pass

pattern ExpectStat :: !(Lens' GoldenStats Double) -> !Tolerance -> Expectation Source #

Expect a specific field to be within tolerance

pattern Or :: !Expectation -> !Expectation -> Expectation Source #

Either expectation can pass

data Tolerance Source #

Tolerance specification for performance comparison.

Constructors

Percent !Double	Percentage tolerance (e.g., `Percent 15.0` = ±15%)
Absolute !Double	Absolute tolerance in milliseconds (e.g., `Absolute 0.01` = ±0.01ms)
Hybrid !Double !Double	Hybrid tolerance: pass if EITHER percentage OR absolute is satisfied (e.g., `Hybrid 15.0 0.01` = pass if within ±15% OR ±0.01ms)
MustImprove !Double	Must be faster by at least this percentage (e.g., `MustImprove 10.0` = must be ≥10% faster)
MustRegress !Double	Must be slower by at least this percentage (e.g., `MustRegress 5.0` = must be ≥5% slower)

Instances

Instances details

Show Tolerance Source #
Instance details Defined in Test.Hspec.BenchGolden.Lenses Methods showsPrec :: Int -> Tolerance -> ShowS # show :: Tolerance -> String # showList :: [Tolerance] -> ShowS #
Eq Tolerance Source #
Instance details Defined in Test.Hspec.BenchGolden.Lenses Methods (==) :: Tolerance -> Tolerance -> Bool # (/=) :: Tolerance -> Tolerance -> Bool #

metricFor :: BenchConfig -> Lens' GoldenStats Double Source #

Select the appropriate central tendency metric based on configuration.

Returns:

_statsTrimmedMean if useRobustStatistics is True
_statsMean otherwise

Example:

let lens = metricFor config
    baseline = golden ^. lens
    current = actual ^. lens

varianceFor :: BenchConfig -> Lens' GoldenStats Double Source #

Select the appropriate dispersion metric based on configuration.

Returns:

_statsMAD if useRobustStatistics is True
_statsStddev otherwise

Example:

let vLens = varianceFor config
    goldenVar = golden ^. vLens
    actualVar = actual ^. vLens

_statsMean :: Lens' GoldenStats Double Source #

Lens for mean execution time in milliseconds.

_statsStddev :: Lens' GoldenStats Double Source #

Lens for standard deviation in milliseconds.

_statsMedian :: Lens' GoldenStats Double Source #

Lens for median execution time in milliseconds.

_statsMin :: Lens' GoldenStats Double Source #

Lens for minimum execution time in milliseconds.

_statsMax :: Lens' GoldenStats Double Source #

Lens for maximum execution time in milliseconds.

_statsTrimmedMean :: Lens' GoldenStats Double Source #

Lens for trimmed mean (with tails removed) in milliseconds.

_statsMAD :: Lens' GoldenStats Double Source #

Lens for median absolute deviation (MAD) in milliseconds.

_statsIQR :: Lens' GoldenStats Double Source #

Lens for interquartile range (IQR = Q3 - Q1) in milliseconds.

expectStat :: Lens' GoldenStats Double -> Tolerance -> Expectation Source #

Create an expectation using a custom lens.

This is an alias for expect for compatibility.

checkExpectation :: Expectation -> GoldenStats -> GoldenStats -> Bool Source #

Check if an expectation is satisfied for the given golden and actual stats.

Returns True if the expectation passes, False otherwise.

withinPercent :: Double -> Double -> Double -> Bool Source #

Check if value is within percentage tolerance.

withinPercent 15.0 baseline actual  -- within ±15%

withinAbsolute :: Double -> Double -> Double -> Bool Source #

Check if value is within absolute tolerance (milliseconds).

withinAbsolute 0.01 baseline actual  -- within ±0.01ms

withinHybrid :: Double -> Double -> Double -> Double -> Bool Source #

Check if value satisfies hybrid tolerance (percentage OR absolute).

withinHybrid 15.0 0.01 baseline actual  -- within ±15% OR ±0.01ms

mustImprove :: Double -> Double -> Double -> Bool Source #

Check if actual is faster than baseline by at least the given percentage.

mustImprove 10.0 baseline actual  -- must be ≥10% faster

mustRegress :: Double -> Double -> Double -> Bool Source #

Check if actual is slower than baseline by at least the given percentage.

mustRegress 5.0 baseline actual  -- must be ≥5% slower

(@~) :: Double -> Double -> Double -> Bool infixl 4 Source #

Infix operator for percentage tolerance check.

baseline @~ 15.0 $ actual  -- within ±15%

(@<) :: Double -> Double -> Double -> Bool infixl 4 Source #

Infix operator for absolute tolerance check.

baseline @< 0.01 $ actual  -- within ±0.01ms

(@<<) :: Double -> Double -> Double -> Bool infixl 4 Source #

Infix operator for "must improve" check.

baseline @<< 10.0 $ actual  -- must be ≥10% faster

(@>>) :: Double -> Double -> Double -> Bool infixl 4 Source #

Infix operator for "must regress" check.

baseline @>> 5.0 $ actual  -- must be ≥5% slower

(&&~) :: Expectation -> Expectation -> Expectation infixr 3 Source #

AND composition of expectations (both must pass).

expect _statsMean (Percent 15.0) &&~ expect _statsMAD (Percent 50.0)

(||~) :: Expectation -> Expectation -> Expectation infixr 2 Source #

OR composition of expectations (either can pass).

expect _statsMedian (Percent 10.0) ||~ expect _statsMin (Absolute 0.01)

percentDiff :: Double -> Double -> Double Source #

Calculate percentage difference between baseline and actual.

Returns: ((actual - baseline) / baseline) * 100

Positive = regression (slower)
Negative = improvement (faster)
Zero = no change

absDiff :: Double -> Double -> Double Source #

Calculate absolute difference between baseline and actual.

Returns: abs(actual - baseline)

toleranceFromExpectation :: Expectation -> (Double, Maybe Double) Source #

Extract tolerance description from an expectation for error messages. For compound expectations (And/Or), returns the first tolerance found.

toleranceValues :: Tolerance -> (Double, Maybe Double) Source #

Extract percentage and optional absolute tolerance from a Tolerance.

Re-exports

module Test.Hspec.BenchGolden.Arch

module Test.Hspec.BenchGolden.CSV

Orphan instances

Example BenchGolden Source #

Instance for BenchGolden without arguments.

Instance details

Associated Types

type Arg BenchGolden
Instance details Defined in Test.Hspec.BenchGolden type Arg BenchGolden = ()

Methods

evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result #

Example (arg -> BenchGolden) Source #

Instance for BenchGolden with an argument.

This allows benchmarks to receive setup data from before or around combinators.

Instance details

Associated Types

type Arg (arg -> BenchGolden)
Instance details Defined in Test.Hspec.BenchGolden type Arg (arg -> BenchGolden) = arg

Methods

evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result #