| Copyright | (c) 2026 |
|---|---|
| License | MIT |
| Maintainer | your.email@example.com |
| Safe Haskell | None |
| Language | Haskell2010 |
Test.Hspec.BenchGolden
Description
Overview
golds-gym is a framework for golden testing of performance benchmarks.
It integrates with hspec and uses benchpress for lightweight timing measurements.
Optionally, benchmarks can use robust statistics to mitigate the impact of outliers.
Quick Start
import Test.Hspec
import Test.Hspec.BenchGolden
main :: IO ()
main = hspec $ do
describe "Performance" $ do
benchGolden "my algorithm" $
return $ myAlgorithm input
How It Works
- On first run, the benchmark is executed and results are saved to a golden file as the baseline.
- On subsequent runs, the benchmark is executed and compared against the baseline using a configurable tolerance (default ±15%).
- If the mean time exceeds the tolerance, the test fails with a regression or improvement notification.
Architecture-Specific Baselines
Golden files are stored per-architecture to ensure benchmarks are only compared against equivalent hardware. The architecture identifier includes CPU type, OS, and CPU model.
Configuration
Use benchGoldenWith with a custom BenchConfig to adjust:
- Number of iterations
- Warm-up iterations
- Tolerance percentage
- Absolute tolerance (hybrid tolerance strategy)
- Variance warnings
- Robust statistics mode (trimmed mean, MAD, outlier detection)
Tolerance Configuration
The framework supports two tolerance mechanisms that work together:
- Percentage tolerance (
tolerancePercent): Checks if the mean time change is within ±X% of the baseline. This is the traditional approach and works well for operations that take more than a few milliseconds. - Absolute tolerance (
absoluteToleranceMs): Checks if the absolute time difference is within X milliseconds. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute differences.
By default, benchmarks pass if EITHER tolerance is satisfied:
pass = (percentChange <= 15%) OR (absTimeDiff <= 0.01 ms)
This hybrid strategy combines the benefits of both approaches:
- For fast operations (< 1ms): Absolute tolerance dominates, preventing noise
- For slow operations (> 1ms): Percentage tolerance dominates, catching real regressions
To disable absolute tolerance and use percentage-only comparison:
benchGoldenWith defaultBenchConfig
{ absoluteToleranceMs = Nothing
}
"benchmark" $ ...
To adjust the absolute tolerance threshold:
benchGoldenWith defaultBenchConfig
{ absoluteToleranceMs = Just 0.001 -- 1 microsecond (very strict)
}
"benchmark" $ ...
Environment Variables
GOLDS_GYM_ACCEPT=1- Regenerate all golden filesGOLDS_GYM_SKIP=1- Skip all benchmark testsGOLDS_GYM_ARCH=custom-id- Override architecture detection
Synopsis
- benchGolden :: String -> IO () -> Spec
- benchGoldenWith :: BenchConfig -> String -> IO () -> Spec
- benchGoldenIO :: String -> IO () -> Spec
- benchGoldenIOWith :: BenchConfig -> String -> IO () -> Spec
- data BenchConfig = BenchConfig {}
- defaultBenchConfig :: BenchConfig
- data BenchGolden = BenchGolden {
- benchName :: !String
- benchAction :: !(IO ())
- benchConfig :: !BenchConfig
- data GoldenStats = GoldenStats {
- statsMean :: !Double
- statsStddev :: !Double
- statsMedian :: !Double
- statsMin :: !Double
- statsMax :: !Double
- statsPercentiles :: ![(Int, Double)]
- statsArch :: !Text
- statsTimestamp :: !UTCTime
- statsTrimmedMean :: !Double
- statsMAD :: !Double
- statsIQR :: !Double
- statsOutliers :: ![Double]
- data BenchResult
- = FirstRun !GoldenStats
- | Pass !GoldenStats !GoldenStats ![Warning]
- | Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- | Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- data Warning
- = VarianceIncreased !Double !Double !Double !Double
- | VarianceDecreased !Double !Double !Double !Double
- | HighVariance !Double
- | OutliersDetected !Int ![Double]
- data ArchConfig = ArchConfig {}
- runBenchGolden :: BenchGolden -> IO BenchResult
- module Test.Hspec.BenchGolden.Arch
Spec Combinators
Create a benchmark golden test with default configuration.
This is the simplest way to add a benchmark test:
describe Sorting $ do benchGolden "quicksort 1000 elements" $ return $ quicksort [1000, 999..1]
Default configuration:
- 100 iterations
- 5 warm-up iterations
- 15% tolerance
- Variance warnings enabled
- Standard statistics (not robust mode)
Arguments
| :: BenchConfig | Configuration parameters |
| -> String | Name of the benchmark |
| -> IO () | The IO action to benchmark |
| -> Spec |
Create a benchmark golden test with custom configuration.
Examples:
-- Tighter tolerance for critical code
benchGoldenWith defaultBenchConfig
{ iterations = 500
, tolerancePercent = 5.0
, warmupIterations = 20
}
"hot loop" $
return $ criticalFunction input
-- Robust statistics mode for noisy environments
benchGoldenWith defaultBenchConfig
{ useRobustStatistics = True
, trimPercent = 10.0
, outlierThreshold = 3.0
}
"benchmark with outliers" $
return $ computation input
Create a benchmark golden test for an IO action.
This is an alias for benchGolden that makes it clear the action
involves IO (e.g., file operations, network calls).
benchGoldenIO "file read" $ do contents <- readFile "large-file.txt" evaluate (length contents)
Note: For IO actions in noisy environments (CI, shared systems),
consider using benchGoldenIOWith with useRobustStatistics = True.
Arguments
| :: BenchConfig | Configuration parameters |
| -> String | Name of the benchmark |
| -> IO () | The IO action to benchmark |
| -> Spec |
Create an IO benchmark golden test with custom configuration.
Configuration
data BenchConfig Source #
Configurable parameters for benchmark execution and comparison.
Constructors
| BenchConfig | |
Fields
| |
Instances
defaultBenchConfig :: BenchConfig Source #
Default benchmark configuration with sensible defaults.
- 100 iterations
- 5 warm-up iterations
- 15% tolerance on mean time
- 0.01 ms (10 microseconds) absolute tolerance - prevents false failures for fast operations
- Variance warnings enabled at 50% tolerance
- Output to
.golden/directory - Success on first run (creates baseline)
Hybrid Tolerance Strategy
The default configuration uses BOTH percentage and absolute tolerance:
- Benchmarks pass if mean time is within ±15% OR within ±0.01ms
- This prevents measurement noise from failing fast operations (< 1ms)
- For slower operations (> 1ms), percentage tolerance dominates
Set absoluteToleranceMs = Nothing for percentage-only comparison.
Types
data BenchGolden Source #
Configuration for a single benchmark golden test.
Constructors
| BenchGolden | |
Fields
| |
Instances
| Example BenchGolden Source # | Instance for BenchGolden without arguments. | ||||
Defined in Test.Hspec.BenchGolden Associated Types
Methods evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| Example (arg -> BenchGolden) Source # | Instance for BenchGolden with an argument. This allows benchmarks to receive setup data from | ||||
Defined in Test.Hspec.BenchGolden Associated Types
Methods evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| type Arg BenchGolden Source # | |||||
Defined in Test.Hspec.BenchGolden | |||||
| type Arg (arg -> BenchGolden) Source # | |||||
Defined in Test.Hspec.BenchGolden | |||||
data GoldenStats Source #
Statistics stored in golden files.
These represent the baseline performance characteristics of a benchmark on a specific architecture.
Constructors
| GoldenStats | |
Fields
| |
Instances
| FromJSON GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| ToJSON GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods toJSON :: GoldenStats -> Value # toEncoding :: GoldenStats -> Encoding # toJSONList :: [GoldenStats] -> Value # toEncodingList :: [GoldenStats] -> Encoding # omitField :: GoldenStats -> Bool # | |||||
| Generic GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Associated Types
| |||||
| Show GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> GoldenStats -> ShowS # show :: GoldenStats -> String # showList :: [GoldenStats] -> ShowS # | |||||
| Eq GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| type Rep GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types type Rep GoldenStats = D1 ('MetaData "GoldenStats" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "GoldenStats" 'PrefixI 'True) (((S1 ('MetaSel ('Just "statsMean") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsStddev") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsMedian") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))) :*: (S1 ('MetaSel ('Just "statsMin") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsMax") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsPercentiles") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 [(Int, Double)])))) :*: ((S1 ('MetaSel ('Just "statsArch") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: (S1 ('MetaSel ('Just "statsTimestamp") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 UTCTime) :*: S1 ('MetaSel ('Just "statsTrimmedMean") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))) :*: (S1 ('MetaSel ('Just "statsMAD") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsIQR") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsOutliers") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 [Double])))))) | |||||
data BenchResult Source #
Result of running a benchmark and comparing against golden.
Constructors
| FirstRun !GoldenStats | No golden file existed; baseline created |
| Pass !GoldenStats !GoldenStats ![Warning] | Benchmark passed (golden stats, actual stats, warnings) |
| Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double) | Performance regression (golden, actual, percent change, tolerance, absolute tolerance) |
| Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double) | Performance improvement (golden, actual, percent change, tolerance, absolute tolerance) |
Instances
| Show BenchResult Source # | |
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> BenchResult -> ShowS # show :: BenchResult -> String # showList :: [BenchResult] -> ShowS # | |
| Eq BenchResult Source # | |
Defined in Test.Hspec.BenchGolden.Types | |
Warnings that may be emitted during benchmark comparison.
Constructors
| VarianceIncreased !Double !Double !Double !Double | Stddev increased (golden, actual, percent change, tolerance) |
| VarianceDecreased !Double !Double !Double !Double | Stddev decreased significantly (golden, actual, percent change, tolerance) |
| HighVariance !Double | Current run has unusually high variance |
| OutliersDetected !Int ![Double] | Outliers detected (count, list of outlier timings) |
data ArchConfig Source #
Machine architecture configuration.
Used to generate unique identifiers for golden file directories, ensuring benchmarks are only compared against equivalent hardware.
Constructors
| ArchConfig | |
Instances
| FromJSON ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| ToJSON ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods toJSON :: ArchConfig -> Value # toEncoding :: ArchConfig -> Encoding # toJSONList :: [ArchConfig] -> Value # toEncodingList :: [ArchConfig] -> Encoding # omitField :: ArchConfig -> Bool # | |||||
| Generic ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Associated Types
| |||||
| Show ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> ArchConfig -> ShowS # show :: ArchConfig -> String # showList :: [ArchConfig] -> ShowS # | |||||
| Eq ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| type Rep ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text))))) | |||||
Low-Level API
runBenchGolden :: BenchGolden -> IO BenchResult Source #
Run a benchmark golden test.
This function:
- Runs warm-up iterations (discarded)
- Runs the actual benchmark
- Writes actual results to
.actualfile - If no golden exists, creates it (first run)
- Otherwise, compares against golden with tolerance
The result includes any warnings (e.g., variance changes).
Re-exports
module Test.Hspec.BenchGolden.Arch
Orphan instances
| Example BenchGolden Source # | Instance for BenchGolden without arguments. | ||||
Associated Types
Methods evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| Example (arg -> BenchGolden) Source # | Instance for BenchGolden with an argument. This allows benchmarks to receive setup data from | ||||
Associated Types
Methods evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result # | |||||