golds-gym-0.2.0.0: Golden testing framework for performance benchmarks
Copyright(c) 2026
LicenseMIT
Maintaineryour.email@example.com
Safe HaskellNone
LanguageHaskell2010

Test.Hspec.BenchGolden

Description

Overview

golds-gym is a framework for golden testing of performance benchmarks. It integrates with hspec and uses benchpress for lightweight timing measurements.

Optionally, benchmarks can use robust statistics to mitigate the impact of outliers.

Quick Start

import Test.Hspec
import Test.Hspec.BenchGolden

main :: IO ()
main = hspec $ do
  describe "Performance" $ do
    benchGolden "my algorithm" $
      return $ myAlgorithm input

How It Works

  1. On first run, the benchmark is executed and results are saved to a golden file as the baseline.
  1. On subsequent runs, the benchmark is executed and compared against the baseline using a configurable tolerance (default ±15%).
  2. If the mean time exceeds the tolerance, the test fails with a regression or improvement notification.

Architecture-Specific Baselines

Golden files are stored per-architecture to ensure benchmarks are only compared against equivalent hardware. The architecture identifier includes CPU type, OS, and CPU model.

Configuration

Use benchGoldenWith with a custom BenchConfig to adjust:

  • Number of iterations
  • Warm-up iterations
  • Tolerance percentage
  • Absolute tolerance (hybrid tolerance strategy)
  • Variance warnings
  • Robust statistics mode (trimmed mean, MAD, outlier detection)

Tolerance Configuration

The framework supports two tolerance mechanisms that work together:

  1. Percentage tolerance (tolerancePercent): Checks if the mean time change is within ±X% of the baseline. This is the traditional approach and works well for operations that take more than a few milliseconds.
  2. Absolute tolerance (absoluteToleranceMs): Checks if the absolute time difference is within X milliseconds. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute differences.

By default, benchmarks pass if EITHER tolerance is satisfied:

pass = (percentChange <= 15%) OR (absTimeDiff <= 0.01 ms)

This hybrid strategy combines the benefits of both approaches:

  • For fast operations (< 1ms): Absolute tolerance dominates, preventing noise
  • For slow operations (> 1ms): Percentage tolerance dominates, catching real regressions

To disable absolute tolerance and use percentage-only comparison:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Nothing
  }
  "benchmark" $ ...

To adjust the absolute tolerance threshold:

benchGoldenWith defaultBenchConfig
  { absoluteToleranceMs = Just 0.001  -- 1 microsecond (very strict)
  }
  "benchmark" $ ...

Environment Variables

  • GOLDS_GYM_ACCEPT=1 - Regenerate all golden files
  • GOLDS_GYM_SKIP=1 - Skip all benchmark tests
  • GOLDS_GYM_ARCH=custom-id - Override architecture detection
Synopsis

Spec Combinators

benchGolden Source #

Arguments

:: String

Name of the benchmark

-> IO ()

The IO action to benchmark

-> Spec 

Create a benchmark golden test with default configuration.

This is the simplest way to add a benchmark test:

describe Sorting $ do
  benchGolden "quicksort 1000 elements" $
    return $ quicksort [1000, 999..1]

Default configuration:

  • 100 iterations
  • 5 warm-up iterations
  • 15% tolerance
  • Variance warnings enabled
  • Standard statistics (not robust mode)

benchGoldenWith Source #

Arguments

:: BenchConfig

Configuration parameters

-> String

Name of the benchmark

-> IO ()

The IO action to benchmark

-> Spec 

Create a benchmark golden test with custom configuration.

Examples:

-- Tighter tolerance for critical code
benchGoldenWith defaultBenchConfig
  { iterations = 500
  , tolerancePercent = 5.0
  , warmupIterations = 20
  }
  "hot loop" $
  return $ criticalFunction input

-- Robust statistics mode for noisy environments
benchGoldenWith defaultBenchConfig
  { useRobustStatistics = True
  , trimPercent = 10.0
  , outlierThreshold = 3.0
  }
  "benchmark with outliers" $
  return $ computation input

benchGoldenIO Source #

Arguments

:: String

Name of the benchmark

-> IO ()

The IO action to benchmark

-> Spec 

Create a benchmark golden test for an IO action.

This is an alias for benchGolden that makes it clear the action involves IO (e.g., file operations, network calls).

benchGoldenIO "file read" $ do
  contents <- readFile "large-file.txt"
  evaluate (length contents)

Note: For IO actions in noisy environments (CI, shared systems), consider using benchGoldenIOWith with useRobustStatistics = True.

benchGoldenIOWith Source #

Arguments

:: BenchConfig

Configuration parameters

-> String

Name of the benchmark

-> IO ()

The IO action to benchmark

-> Spec 

Create an IO benchmark golden test with custom configuration.

Configuration

data BenchConfig Source #

Configurable parameters for benchmark execution and comparison.

Constructors

BenchConfig 

Fields

Instances

Instances details
Generic BenchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep BenchConfig 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep BenchConfig = D1 ('MetaData "BenchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "BenchConfig" 'PrefixI 'True) (((S1 ('MetaSel ('Just "iterations") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Int) :*: S1 ('MetaSel ('Just "warmupIterations") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Int)) :*: (S1 ('MetaSel ('Just "tolerancePercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "absoluteToleranceMs") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Double)) :*: S1 ('MetaSel ('Just "warnOnVarianceChange") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool)))) :*: ((S1 ('MetaSel ('Just "varianceTolerancePercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "outputDir") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 FilePath) :*: S1 ('MetaSel ('Just "failOnFirstRun") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool))) :*: (S1 ('MetaSel ('Just "useRobustStatistics") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool) :*: (S1 ('MetaSel ('Just "trimPercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "outlierThreshold") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))))))
Show BenchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Eq BenchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep BenchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep BenchConfig = D1 ('MetaData "BenchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "BenchConfig" 'PrefixI 'True) (((S1 ('MetaSel ('Just "iterations") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Int) :*: S1 ('MetaSel ('Just "warmupIterations") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Int)) :*: (S1 ('MetaSel ('Just "tolerancePercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "absoluteToleranceMs") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Double)) :*: S1 ('MetaSel ('Just "warnOnVarianceChange") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool)))) :*: ((S1 ('MetaSel ('Just "varianceTolerancePercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "outputDir") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 FilePath) :*: S1 ('MetaSel ('Just "failOnFirstRun") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool))) :*: (S1 ('MetaSel ('Just "useRobustStatistics") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Bool) :*: (S1 ('MetaSel ('Just "trimPercent") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "outlierThreshold") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))))))

defaultBenchConfig :: BenchConfig Source #

Default benchmark configuration with sensible defaults.

  • 100 iterations
  • 5 warm-up iterations
  • 15% tolerance on mean time
  • 0.01 ms (10 microseconds) absolute tolerance - prevents false failures for fast operations
  • Variance warnings enabled at 50% tolerance
  • Output to .golden/ directory
  • Success on first run (creates baseline)

Hybrid Tolerance Strategy

The default configuration uses BOTH percentage and absolute tolerance:

  • Benchmarks pass if mean time is within ±15% OR within ±0.01ms
  • This prevents measurement noise from failing fast operations (< 1ms)
  • For slower operations (> 1ms), percentage tolerance dominates

Set absoluteToleranceMs = Nothing for percentage-only comparison.

Types

data BenchGolden Source #

Configuration for a single benchmark golden test.

Constructors

BenchGolden 

Fields

Instances

Instances details
Example BenchGolden Source #

Instance for BenchGolden without arguments.

Instance details

Defined in Test.Hspec.BenchGolden

Associated Types

type Arg BenchGolden 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg BenchGolden = ()
Example (arg -> BenchGolden) Source #

Instance for BenchGolden with an argument.

This allows benchmarks to receive setup data from before or around combinators.

Instance details

Defined in Test.Hspec.BenchGolden

Associated Types

type Arg (arg -> BenchGolden) 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg (arg -> BenchGolden) = arg

Methods

evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result #

type Arg BenchGolden Source # 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg BenchGolden = ()
type Arg (arg -> BenchGolden) Source # 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg (arg -> BenchGolden) = arg

data GoldenStats Source #

Statistics stored in golden files.

These represent the baseline performance characteristics of a benchmark on a specific architecture.

Constructors

GoldenStats 

Fields

Instances

Instances details
FromJSON GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

ToJSON GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Generic GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep GoldenStats 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Show GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Eq GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep GoldenStats Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

data BenchResult Source #

Result of running a benchmark and comparing against golden.

Constructors

FirstRun !GoldenStats

No golden file existed; baseline created

Pass !GoldenStats !GoldenStats ![Warning]

Benchmark passed (golden stats, actual stats, warnings)

Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)

Performance regression (golden, actual, percent change, tolerance, absolute tolerance)

Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)

Performance improvement (golden, actual, percent change, tolerance, absolute tolerance)

Instances

Instances details
Show BenchResult Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Eq BenchResult Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

data Warning Source #

Warnings that may be emitted during benchmark comparison.

Constructors

VarianceIncreased !Double !Double !Double !Double

Stddev increased (golden, actual, percent change, tolerance)

VarianceDecreased !Double !Double !Double !Double

Stddev decreased significantly (golden, actual, percent change, tolerance)

HighVariance !Double

Current run has unusually high variance

OutliersDetected !Int ![Double]

Outliers detected (count, list of outlier timings)

Instances

Instances details
Show Warning Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Eq Warning Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Methods

(==) :: Warning -> Warning -> Bool #

(/=) :: Warning -> Warning -> Bool #

data ArchConfig Source #

Machine architecture configuration.

Used to generate unique identifiers for golden file directories, ensuring benchmarks are only compared against equivalent hardware.

Constructors

ArchConfig 

Fields

  • archId :: !Text

    Unique identifier (e.g., "aarch64-darwin-Apple_M1")

  • archOS :: !Text

    Operating system (e.g., "darwin", "linux")

  • archCPU :: !Text

    CPU architecture (e.g., "aarch64", "x86_64")

  • archModel :: !(Maybe Text)

    CPU model if available (e.g., "Apple M1", "Intel Core i7")

Instances

Instances details
FromJSON ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

ToJSON ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Generic ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Associated Types

type Rep ArchConfig 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text)))))
Show ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

Eq ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep ArchConfig Source # 
Instance details

Defined in Test.Hspec.BenchGolden.Types

type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.2.0.0-7NJIEaTIpAGIPUbJaP5I3x" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text)))))

Low-Level API

runBenchGolden :: BenchGolden -> IO BenchResult Source #

Run a benchmark golden test.

This function:

  1. Runs warm-up iterations (discarded)
  2. Runs the actual benchmark
  3. Writes actual results to .actual file
  4. If no golden exists, creates it (first run)
  5. Otherwise, compares against golden with tolerance

The result includes any warnings (e.g., variance changes).

Re-exports

Orphan instances

Example BenchGolden Source #

Instance for BenchGolden without arguments.

Instance details

Associated Types

type Arg BenchGolden 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg BenchGolden = ()
Example (arg -> BenchGolden) Source #

Instance for BenchGolden with an argument.

This allows benchmarks to receive setup data from before or around combinators.

Instance details

Associated Types

type Arg (arg -> BenchGolden) 
Instance details

Defined in Test.Hspec.BenchGolden

type Arg (arg -> BenchGolden) = arg

Methods

evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result #