| Copyright | (c) 2026 |
|---|---|
| License | MIT |
| Maintainer | @ocramz |
| Safe Haskell | None |
| Language | Haskell2010 |
Test.Hspec.BenchGolden
Description
Overview
golds-gym is a framework for golden testing of performance benchmarks.
It integrates with hspec and uses CPU time measurements for benchmarking.
Benchmarks can use robust statistics to mitigate the impact of outliers.
The library can be used both to assert that performance does not regress, and to set expectations
for improvements across project versions (see benchGoldenWithExpectation).
There are also combinators for parameter sweep benchmarks that generate CSV files for analysis and plotting,
see benchGoldenSweep and benchGoldenSweepWith.
Quick Start
import Test.Hspec
import Test.Hspec.BenchGolden
import Data.List (sort)
main :: IO ()
main = hspec $ do
describe "Performance" $ do
-- Pure function with normal form evaluation
benchGolden "list sorting" $
nf (\n -> sort [n, n-1 .. 1]) 1000
-- IO action with result forced to normal form
benchGolden "file read" $
nfIO (readFile "data.txt")
Evaluation strategies control how values are forced:
nf- Force to normal form (deep evaluation, use for most cases)nfIO- Variant for IO actionsnfAppIO- For functions returning IOio- Plain IO action without forcing
Without proper evaluation strategies, GHC may optimize away computations or share results across iterations, making benchmarks meaningless.
Best Practices: Avoiding Shared Thunks
CRITICAL: When benchmarking with data structures, ensure the data is reconstructed on each iteration to avoid measuring shared, cached results.
❌ Anti-pattern (shared list across iterations):
benchGolden "sum" $ nf sum [1..1000000]
The list [1..1000000] is constructed once and shared across all iterations.
This allocates the entire list in memory, creates GC pressure, and prevents
list fusion. The first iteration evaluates the shared thunk, and subsequent
iterations measure cached results.
✅ Correct pattern (list reconstructed per iteration):
benchGolden "sum" $ nf (\n -> sum [1..n]) 1000000
The lambda wrapper ensures the list is reconstructed on every iteration, measuring the true cost of both construction and computation.
Other considerations:
- Ensure return types are inhabited enough to depend on all computations
(avoid
b ~ ()where GHC might optimize away the payload) - For inlinable functions, ensure full saturation: prefer
nf (\n -> f n) xovernf f xto guarantee inlining and rewrite rules fire - Use
NFDataconstraints where applicable to ensure deep evaluation
How It Works
- On first run, the benchmark is executed and results are saved to a golden file as the baseline.
- On subsequent runs, the benchmark is executed and compared against the baseline using a configurable tolerance or expectation combinators.
Architecture-Specific Baselines
Golden files are stored per-architecture to ensure benchmarks are only compared against equivalent hardware. The architecture identifier includes CPU type, OS, and CPU model.
Configuration
Use benchGoldenWith or benchGoldenWithExpectation with a custom BenchConfig:
Tolerance Configuration
The framework supports two tolerance mechanisms that work together:
- Percentage tolerance (
tolerancePercent): Checks if the mean time change is within ±X% of the baseline. This is the traditional approach and works well for operations that take more than a few milliseconds. - Absolute tolerance (
absoluteToleranceMs): Checks if the absolute time difference is within X milliseconds. This prevents false failures for extremely fast operations (< 1ms) where measurement noise causes large percentage variations despite negligible absolute differences.
By default, benchmarks pass if EITHER tolerance is satisfied:
pass = (percentChange <= 15%) OR (absTimeDiff <= 0.01 ms)
This hybrid strategy combines the benefits of both approaches:
- For fast operations (< 1ms): Absolute tolerance dominates, preventing noise
- For slow operations (> 1ms): Percentage tolerance dominates, catching real regressions
To disable absolute tolerance and use percentage-only comparison:
benchGoldenWith defaultBenchConfig
{ absoluteToleranceMs = Nothing
}
"benchmark" $ ...
To adjust the absolute tolerance threshold:
benchGoldenWith defaultBenchConfig
{ absoluteToleranceMs = Just 0.001 -- 1 microsecond (very strict)
}
"benchmark" $ ...
Synopsis
- benchGolden :: String -> BenchAction -> Spec
- benchGoldenWith :: BenchConfig -> String -> BenchAction -> Spec
- benchGoldenWithExpectation :: String -> BenchConfig -> [Expectation] -> BenchAction -> Spec
- benchGoldenSweep :: Show a => String -> Text -> [a] -> (a -> BenchAction) -> Spec
- benchGoldenSweepWith :: Show a => BenchConfig -> String -> Text -> [a] -> (a -> BenchAction) -> Spec
- data BenchConfig = BenchConfig {}
- defaultBenchConfig :: BenchConfig
- data BenchGolden = BenchGolden {
- benchName :: !String
- benchAction :: !BenchAction
- benchConfig :: !BenchConfig
- newtype BenchAction = BenchAction {
- runBenchAction :: Word64 -> IO ()
- data GoldenStats = GoldenStats {
- statsMean :: !Double
- statsStddev :: !Double
- statsMedian :: !Double
- statsMin :: !Double
- statsMax :: !Double
- statsPercentiles :: ![(Int, Double)]
- statsArch :: !Text
- statsTimestamp :: !UTCTime
- statsTrimmedMean :: !Double
- statsMAD :: !Double
- statsIQR :: !Double
- statsOutliers :: ![Double]
- data BenchResult
- = FirstRun !GoldenStats
- | Pass !GoldenStats !GoldenStats ![Warning]
- | Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- | Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double)
- data Warning
- = VarianceIncreased !Double !Double !Double !Double
- | VarianceDecreased !Double !Double !Double !Double
- | HighVariance !Double
- | OutliersDetected !Int ![Double]
- data ArchConfig = ArchConfig {}
- nf :: NFData b => (a -> b) -> a -> BenchAction
- nfIO :: NFData a => IO a -> BenchAction
- nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction
- io :: IO () -> BenchAction
- runBenchGolden :: BenchGolden -> IO BenchResult
- runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats
- runSweep :: Show a => String -> BenchConfig -> Text -> [a] -> (a -> BenchAction) -> IO [(a, BenchResult, GoldenStats)]
- runSweepPoint :: Show a => String -> BenchConfig -> Text -> a -> BenchAction -> IO (BenchResult, GoldenStats)
- compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult
- checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning]
- calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double])
- calculateTrimmedMean :: Double -> Vector Double -> Double
- calculateMAD :: Vector Double -> Double -> Double
- calculateIQR :: Vector Double -> Double
- detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double]
- readGoldenFile :: FilePath -> IO (Either String GoldenStats)
- writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO ()
- getGoldenPath :: FilePath -> FilePath -> String -> FilePath
- getActualPath :: FilePath -> FilePath -> String -> FilePath
- expect :: Lens' GoldenStats Double -> Tolerance -> Expectation
- pattern And :: !Expectation -> !Expectation -> Expectation
- pattern ExpectStat :: !(Lens' GoldenStats Double) -> !Tolerance -> Expectation
- pattern Or :: !Expectation -> !Expectation -> Expectation
- data Tolerance
- metricFor :: BenchConfig -> Lens' GoldenStats Double
- varianceFor :: BenchConfig -> Lens' GoldenStats Double
- _statsMean :: Lens' GoldenStats Double
- _statsStddev :: Lens' GoldenStats Double
- _statsMedian :: Lens' GoldenStats Double
- _statsMin :: Lens' GoldenStats Double
- _statsMax :: Lens' GoldenStats Double
- _statsTrimmedMean :: Lens' GoldenStats Double
- _statsMAD :: Lens' GoldenStats Double
- _statsIQR :: Lens' GoldenStats Double
- expectStat :: Lens' GoldenStats Double -> Tolerance -> Expectation
- checkExpectation :: Expectation -> GoldenStats -> GoldenStats -> Bool
- withinPercent :: Double -> Double -> Double -> Bool
- withinAbsolute :: Double -> Double -> Double -> Bool
- withinHybrid :: Double -> Double -> Double -> Double -> Bool
- mustImprove :: Double -> Double -> Double -> Bool
- mustRegress :: Double -> Double -> Double -> Bool
- (@~) :: Double -> Double -> Double -> Bool
- (@<) :: Double -> Double -> Double -> Bool
- (@<<) :: Double -> Double -> Double -> Bool
- (@>>) :: Double -> Double -> Double -> Bool
- (&&~) :: Expectation -> Expectation -> Expectation
- (||~) :: Expectation -> Expectation -> Expectation
- percentDiff :: Double -> Double -> Double
- absDiff :: Double -> Double -> Double
- toleranceFromExpectation :: Expectation -> (Double, Maybe Double)
- toleranceValues :: Tolerance -> (Double, Maybe Double)
- module Test.Hspec.BenchGolden.Arch
- module Test.Hspec.BenchGolden.CSV
Spec Combinators
Arguments
| :: String | Name of the benchmark |
| -> BenchAction | The benchmarkable action |
| -> Spec |
Create a benchmark golden test with default configuration.
This is the simplest way to add a benchmark test:
describe Sorting $ do benchGolden "quicksort 1000 elements" $ nf quicksort [1000, 999..1]
Use evaluation strategy combinators to control how values are forced:
nf- Normal form (deep evaluation)nfIO- Normal form for IO actionsnfAppIO- Normal form for functions returning IOio- Plain IO action (for backward compatibility)
Default configuration:
- 100 iterations
- 5 warm-up iterations
- 15% tolerance
- Variance warnings enabled
- Standard statistics (not robust mode)
Arguments
| :: BenchConfig | Configuration parameters |
| -> String | Name of the benchmark |
| -> BenchAction | The benchmarkable action |
| -> Spec |
Create a benchmark golden test with custom configuration.
Examples:
-- Tighter tolerance for critical code
benchGoldenWith defaultBenchConfig
{ iterations = 500
, tolerancePercent = 5.0
, warmupIterations = 20
}
"hot loop" $
nf criticalFunction input
-- Robust statistics mode for noisy environments
benchGoldenWith defaultBenchConfig
{ useRobustStatistics = True
, trimPercent = 10.0
, outlierThreshold = 3.0
}
"benchmark with outliers" $
whnf computation input
benchGoldenWithExpectation Source #
Arguments
| :: String | Name of the benchmark |
| -> BenchConfig | Configuration parameters |
| -> [Expectation] | List of expectations (all must pass) |
| -> BenchAction | The benchmarkable action |
| -> Spec |
Create a benchmark golden test with custom lens-based expectations.
This combinator allows you to specify custom performance expectations using
lenses and tolerance combinators. Expectations can be composed using boolean
operators (&&~, ||~).
Examples:
-- Median-based comparison (more robust to outliers) benchGoldenWithExpectation "median test" defaultBenchConfig [expect_statsMedian(Percent10.0)] (nf sort [1000, 999..1]) -- Multiple metrics must pass (AND composition) benchGoldenWithExpectation "strict test" defaultBenchConfig [ expect_statsMean(Percent 15.0) &&~ expect_statsMAD(Percent 50.0) ] (nf algorithm data) -- Either metric can pass (OR composition) benchGoldenWithExpectation "flexible test" defaultBenchConfig [ expect _statsMedian (Percent 10.0) ||~ expect _statsMin (Absolute0.01) ] (nf fastOp input) -- Expect performance improvement (must be faster) benchGoldenWithExpectation "optimization" defaultBenchConfig [expect _statsMean (MustImprove10.0)] -- Must be ≥10% faster (nf optimizedVersion data) -- Expect controlled regression (for feature additions) benchGoldenWithExpectation "new feature" defaultBenchConfig [expect _statsMean (MustRegress5.0)] -- Accept 5-20% slowdown (nf newFeature input) -- Low variance requirement benchGoldenWithExpectation "stable perf" defaultBenchConfig [ expect _statsMean (Percent 15.0) &&~ expect_statsIQR(Absolute 0.1) ] (nfIO stableOperation)
Note: Expectations are checked against golden files. On first run, a baseline
is created. Use GOLDS_GYM_ACCEPT=1 to regenerate baselines.
Parameter Sweeps
Arguments
| :: Show a | |
| => String | Sweep name (used for CSV filename and golden file prefix) |
| -> Text | Parameter name (for CSV column header) |
| -> [a] | Parameter values to sweep over |
| -> (a -> BenchAction) | Action parameterized by sweep value |
| -> Spec |
Create a parameter sweep benchmark with default configuration.
This combinator runs the same benchmark with multiple parameter values, saving individual golden files for each point and producing a single CSV file for analysis and plotting.
Example:
describe "Scaling Tests" $ do
benchGoldenSweep "sort-scaling" "n" [1000, 5000, 10000, 50000] $
\n -> nf sort [n, n-1..1]
This produces:
- Golden files:
.golden/<arch>/sort-scaling_n=1000.golden, etc. - CSV file:
.golden/sort-scaling-<arch>.csv
Arguments
| :: Show a | |
| => BenchConfig | Configuration parameters |
| -> String | Sweep name |
| -> Text | Parameter name (for CSV column header) |
| -> [a] | Parameter values to sweep over |
| -> (a -> BenchAction) | Action parameterized by sweep value |
| -> Spec |
Create a parameter sweep benchmark with custom configuration.
Example:
describe "Performance Scaling" $ do
benchGoldenSweepWith
defaultBenchConfig { iterations = 500, tolerancePercent = 10.0 }
"algorithm-scaling" "size" [100, 500, 1000, 5000] $
\size -> nf myAlgorithm (generateInput size)
The CSV file includes columns for timestamp, parameter value, and all standard statistics (mean, stddev, median, min, max, etc.).
Configuration
data BenchConfig Source #
Configurable parameters for benchmark execution and comparison.
Constructors
| BenchConfig | |
Fields
| |
Instances
defaultBenchConfig :: BenchConfig Source #
Default benchmark configuration with sensible defaults.
- 100 iterations
- 5 warm-up iterations
- 15% tolerance on mean time
- 0.01 ms (10 microseconds) absolute tolerance - prevents false failures for fast operations
- Variance warnings enabled at 50% tolerance
- Output to
.golden/directory - Success on first run (creates baseline)
Hybrid Tolerance Strategy
The default configuration uses BOTH percentage and absolute tolerance:
- Benchmarks pass if mean time is within ±15% OR within ±0.01ms
- This prevents measurement noise from failing fast operations (< 1ms)
- For slower operations (> 1ms), percentage tolerance dominates
Set absoluteToleranceMs = Nothing for percentage-only comparison.
Types
data BenchGolden Source #
Configuration for a single benchmark golden test.
Constructors
| BenchGolden | |
Fields
| |
Instances
| Example BenchGolden Source # | Instance for BenchGolden without arguments. | ||||
Defined in Test.Hspec.BenchGolden Associated Types
Methods evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| Example (arg -> BenchGolden) Source # | Instance for BenchGolden with an argument. This allows benchmarks to receive setup data from | ||||
Defined in Test.Hspec.BenchGolden Associated Types
Methods evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| type Arg BenchGolden Source # | |||||
Defined in Test.Hspec.BenchGolden | |||||
| type Arg (arg -> BenchGolden) Source # | |||||
Defined in Test.Hspec.BenchGolden | |||||
newtype BenchAction Source #
A benchmarkable action that can be run multiple times.
The Word64 parameter represents the number of iterations to execute.
Constructors
| BenchAction | |
Fields
| |
data GoldenStats Source #
Statistics stored in golden files.
These represent the baseline performance characteristics of a benchmark on a specific architecture.
Constructors
| GoldenStats | |
Fields
| |
Instances
| FromJSON GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| ToJSON GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods toJSON :: GoldenStats -> Value # toEncoding :: GoldenStats -> Encoding # toJSONList :: [GoldenStats] -> Value # toEncodingList :: [GoldenStats] -> Encoding # omitField :: GoldenStats -> Bool # | |||||
| Generic GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Associated Types
| |||||
| Show GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> GoldenStats -> ShowS # show :: GoldenStats -> String # showList :: [GoldenStats] -> ShowS # | |||||
| Eq GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| type Rep GoldenStats Source # | |||||
Defined in Test.Hspec.BenchGolden.Types type Rep GoldenStats = D1 ('MetaData "GoldenStats" "Test.Hspec.BenchGolden.Types" "golds-gym-0.7.0.0-5WxaqlnGMSWFcxivvKHgXD" 'False) (C1 ('MetaCons "GoldenStats" 'PrefixI 'True) (((S1 ('MetaSel ('Just "statsMean") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsStddev") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsMedian") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))) :*: (S1 ('MetaSel ('Just "statsMin") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsMax") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsPercentiles") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 [(Int, Double)])))) :*: ((S1 ('MetaSel ('Just "statsArch") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: (S1 ('MetaSel ('Just "statsTimestamp") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 UTCTime) :*: S1 ('MetaSel ('Just "statsTrimmedMean") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double))) :*: (S1 ('MetaSel ('Just "statsMAD") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: (S1 ('MetaSel ('Just "statsIQR") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Double) :*: S1 ('MetaSel ('Just "statsOutliers") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 [Double])))))) | |||||
data BenchResult Source #
Result of running a benchmark and comparing against golden.
Constructors
| FirstRun !GoldenStats | No golden file existed; baseline created |
| Pass !GoldenStats !GoldenStats ![Warning] | Benchmark passed (golden stats, actual stats, warnings) |
| Regression !GoldenStats !GoldenStats !Double !Double !(Maybe Double) | Performance regression (golden, actual, percent change, tolerance, absolute tolerance) |
| Improvement !GoldenStats !GoldenStats !Double !Double !(Maybe Double) | Performance improvement (golden, actual, percent change, tolerance, absolute tolerance) |
Instances
| Show BenchResult Source # | |
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> BenchResult -> ShowS # show :: BenchResult -> String # showList :: [BenchResult] -> ShowS # | |
| Eq BenchResult Source # | |
Defined in Test.Hspec.BenchGolden.Types | |
Warnings that may be emitted during benchmark comparison.
Constructors
| VarianceIncreased !Double !Double !Double !Double | Stddev increased (golden, actual, percent change, tolerance) |
| VarianceDecreased !Double !Double !Double !Double | Stddev decreased significantly (golden, actual, percent change, tolerance) |
| HighVariance !Double | Current run has unusually high variance |
| OutliersDetected !Int ![Double] | Outliers detected (count, list of outlier timings) |
data ArchConfig Source #
Machine architecture configuration.
Used to generate unique identifiers for golden file directories, ensuring benchmarks are only compared against equivalent hardware.
Constructors
| ArchConfig | |
Instances
| FromJSON ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| ToJSON ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods toJSON :: ArchConfig -> Value # toEncoding :: ArchConfig -> Encoding # toJSONList :: [ArchConfig] -> Value # toEncodingList :: [ArchConfig] -> Encoding # omitField :: ArchConfig -> Bool # | |||||
| Generic ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Associated Types
| |||||
| Show ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types Methods showsPrec :: Int -> ArchConfig -> ShowS # show :: ArchConfig -> String # showList :: [ArchConfig] -> ShowS # | |||||
| Eq ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types | |||||
| type Rep ArchConfig Source # | |||||
Defined in Test.Hspec.BenchGolden.Types type Rep ArchConfig = D1 ('MetaData "ArchConfig" "Test.Hspec.BenchGolden.Types" "golds-gym-0.7.0.0-5WxaqlnGMSWFcxivvKHgXD" 'False) (C1 ('MetaCons "ArchConfig" 'PrefixI 'True) ((S1 ('MetaSel ('Just "archId") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archOS") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text)) :*: (S1 ('MetaSel ('Just "archCPU") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Text) :*: S1 ('MetaSel ('Just "archModel") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 (Maybe Text))))) | |||||
Benchmarkable Constructors
nf :: NFData b => (a -> b) -> a -> BenchAction Source #
Benchmark a pure function applied to an argument, forcing the result to
normal form (NF) using rnf from Control.DeepSeq.
This ensures the entire result structure is evaluated.
Example:
benchGolden "fib 30" (nf fib 30)
nfIO :: NFData a => IO a -> BenchAction Source #
Benchmark an IO action, forcing the result to normal form.
Example:
benchGolden "readFile" (nfIO $ readFile "data.txt")
nfAppIO :: NFData b => (a -> IO b) -> a -> BenchAction Source #
Benchmark a function that performs IO, forcing the result to normal form.
Example:
benchGolden "lookup in map" (nfAppIO lookupInDB "key")
io :: IO () -> BenchAction Source #
Benchmark an IO action, discarding the result.
This is for backward compatibility with code that uses IO () actions.
Example:
benchGolden "compute" (io $ do result <- heavyComputation evaluate result)
Low-Level API
runBenchGolden :: BenchGolden -> IO BenchResult Source #
Run a benchmark golden test.
This function:
- Runs warm-up iterations (discarded)
- Runs the actual benchmark
- Writes actual results to
.actualfile - If no golden exists, creates it (first run)
- Otherwise, compares against golden with tolerance
The result includes any warnings (e.g., variance changes).
Standalone Runner API
These functions can be used independently of hspec for programmatic benchmarking workflows.
runBenchmark :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark and collect statistics.
Uses raw timing collection with proper inner iteration counts to ensure the SPEC trick in nf/nfIO prevents thunk sharing.
runBenchmarkWithRawTimings :: String -> BenchAction -> BenchConfig -> ArchConfig -> IO GoldenStats Source #
Run a benchmark with raw timing collection for robust statistics.
This function times running all iterations in a single batch, then divides to get per-iteration timing. The SPEC trick in nf/nfIO prevents sharing within the batch.
We collect multiple samples by running the full batch multiple times, ensuring accurate measurements even with GHC's -O2 optimizations.
Arguments
| :: Show a | |
| => String | Sweep name |
| -> BenchConfig | |
| -> Text | Parameter name (for CSV column header) |
| -> [a] | Parameter values to sweep over |
| -> (a -> BenchAction) | Action generator |
| -> IO [(a, BenchResult, GoldenStats)] |
Run a full parameter sweep and write CSV output.
This runs benchmarks for all parameter values, saves individual golden files, and writes a single CSV file with all results for analysis.
The CSV file is placed at:
<outputDir>/<sweep-name>-<arch-id>.csv
Arguments
| :: Show a | |
| => String | Base sweep name |
| -> BenchConfig | |
| -> Text | Parameter name |
| -> a | Parameter value |
| -> BenchAction | |
| -> IO (BenchResult, GoldenStats) |
Run a single point of a parameter sweep.
This is similar to runBenchGolden but returns the GoldenStats along
with the BenchResult, allowing the caller to accumulate stats for CSV export.
Each point is saved to its own golden file with the parameter value
included in the filename (e.g., sort-scaling_n=1000.golden).
Comparison Utilities
compareStats :: BenchConfig -> GoldenStats -> GoldenStats -> BenchResult Source #
Compare actual stats against golden stats.
Returns a BenchResult indicating whether the benchmark passed,
regressed, or improved, along with any warnings.
Hybrid Tolerance Strategy
The comparison uses BOTH percentage and absolute tolerance (when configured):
- Calculate percentage difference:
((actual - golden) / golden) * 100 - Pass if
abs(percentDiff) <= tolerancePercent(percentage check) - OR if
abs(actual - golden) <= absoluteToleranceMs(absolute check)
This prevents false failures for sub-millisecond operations where measurement noise creates large percentage variations despite negligible absolute differences.
checkVariance :: BenchConfig -> GoldenStats -> GoldenStats -> [Warning] Source #
Check for variance changes and generate warnings.
Robust Statistics
calculateRobustStats :: BenchConfig -> Vector Double -> Double -> (Double, Double, Double, [Double]) Source #
Calculate robust statistics from raw timing data.
Returns: (trimmed mean, MAD, IQR, outliers)
calculateTrimmedMean :: Double -> Vector Double -> Double Source #
Calculate trimmed mean by removing specified percentage from each tail.
calculateMAD :: Vector Double -> Double -> Double Source #
Calculate Median Absolute Deviation (MAD).
MAD = median(|x_i - median(x)|)
detectOutliers :: Double -> Vector Double -> Double -> Double -> [Double] Source #
Detect outliers using MAD-based threshold.
An observation is an outlier if: |x - median| > threshold * MAD
Golden File I/O
readGoldenFile :: FilePath -> IO (Either String GoldenStats) Source #
Read a golden file.
writeGoldenFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write a golden file.
writeActualFile :: FilePath -> FilePath -> String -> GoldenStats -> IO () Source #
Write an actual results file.
getActualPath :: FilePath -> FilePath -> String -> FilePath Source #
Get the path for an actual results file.
Lens-Based Expectations
expect :: Lens' GoldenStats Double -> Tolerance -> Expectation Source #
Create an expectation for a specific statistic field.
Example:
expect _statsMedian (Percent 10.0) expect _statsIQR (Absolute 0.5) expect _statsMean (Hybrid 15.0 0.01) expect _statsMean (MustImprove 10.0)
pattern And :: !Expectation -> !Expectation -> Expectation Source #
Both expectations must pass
pattern ExpectStat :: !(Lens' GoldenStats Double) -> !Tolerance -> Expectation Source #
Expect a specific field to be within tolerance
pattern Or :: !Expectation -> !Expectation -> Expectation Source #
Either expectation can pass
Tolerance specification for performance comparison.
Constructors
| Percent !Double | Percentage tolerance (e.g., |
| Absolute !Double | Absolute tolerance in milliseconds (e.g., |
| Hybrid !Double !Double | Hybrid tolerance: pass if EITHER percentage OR absolute is satisfied
(e.g., |
| MustImprove !Double | Must be faster by at least this percentage (e.g., |
| MustRegress !Double | Must be slower by at least this percentage (e.g., |
metricFor :: BenchConfig -> Lens' GoldenStats Double Source #
Select the appropriate central tendency metric based on configuration.
Returns:
_statsTrimmedMeanifuseRobustStatisticsisTrue_statsMeanotherwise
Example:
let lens = metricFor config
baseline = golden ^. lens
current = actual ^. lens
varianceFor :: BenchConfig -> Lens' GoldenStats Double Source #
Select the appropriate dispersion metric based on configuration.
Returns:
_statsMADifuseRobustStatisticsisTrue_statsStddevotherwise
Example:
let vLens = varianceFor config
goldenVar = golden ^. vLens
actualVar = actual ^. vLens
_statsMean :: Lens' GoldenStats Double Source #
Lens for mean execution time in milliseconds.
_statsStddev :: Lens' GoldenStats Double Source #
Lens for standard deviation in milliseconds.
_statsMedian :: Lens' GoldenStats Double Source #
Lens for median execution time in milliseconds.
_statsTrimmedMean :: Lens' GoldenStats Double Source #
Lens for trimmed mean (with tails removed) in milliseconds.
_statsMAD :: Lens' GoldenStats Double Source #
Lens for median absolute deviation (MAD) in milliseconds.
_statsIQR :: Lens' GoldenStats Double Source #
Lens for interquartile range (IQR = Q3 - Q1) in milliseconds.
expectStat :: Lens' GoldenStats Double -> Tolerance -> Expectation Source #
Create an expectation using a custom lens.
This is an alias for expect for compatibility.
checkExpectation :: Expectation -> GoldenStats -> GoldenStats -> Bool Source #
withinPercent :: Double -> Double -> Double -> Bool Source #
Check if value is within percentage tolerance.
withinPercent 15.0 baseline actual -- within ±15%
withinAbsolute :: Double -> Double -> Double -> Bool Source #
Check if value is within absolute tolerance (milliseconds).
withinAbsolute 0.01 baseline actual -- within ±0.01ms
withinHybrid :: Double -> Double -> Double -> Double -> Bool Source #
Check if value satisfies hybrid tolerance (percentage OR absolute).
withinHybrid 15.0 0.01 baseline actual -- within ±15% OR ±0.01ms
mustImprove :: Double -> Double -> Double -> Bool Source #
Check if actual is faster than baseline by at least the given percentage.
mustImprove 10.0 baseline actual -- must be ≥10% faster
mustRegress :: Double -> Double -> Double -> Bool Source #
Check if actual is slower than baseline by at least the given percentage.
mustRegress 5.0 baseline actual -- must be ≥5% slower
(@~) :: Double -> Double -> Double -> Bool infixl 4 Source #
Infix operator for percentage tolerance check.
baseline @~ 15.0 $ actual -- within ±15%
(@<) :: Double -> Double -> Double -> Bool infixl 4 Source #
Infix operator for absolute tolerance check.
baseline @< 0.01 $ actual -- within ±0.01ms
(@<<) :: Double -> Double -> Double -> Bool infixl 4 Source #
Infix operator for "must improve" check.
baseline @<< 10.0 $ actual -- must be ≥10% faster
(@>>) :: Double -> Double -> Double -> Bool infixl 4 Source #
Infix operator for "must regress" check.
baseline @>> 5.0 $ actual -- must be ≥5% slower
(&&~) :: Expectation -> Expectation -> Expectation infixr 3 Source #
AND composition of expectations (both must pass).
expect _statsMean (Percent 15.0) &&~ expect _statsMAD (Percent 50.0)
(||~) :: Expectation -> Expectation -> Expectation infixr 2 Source #
OR composition of expectations (either can pass).
expect _statsMedian (Percent 10.0) ||~ expect _statsMin (Absolute 0.01)
percentDiff :: Double -> Double -> Double Source #
Calculate percentage difference between baseline and actual.
Returns: ((actual - baseline) / baseline) * 100
- Positive = regression (slower)
- Negative = improvement (faster)
- Zero = no change
absDiff :: Double -> Double -> Double Source #
Calculate absolute difference between baseline and actual.
Returns: abs(actual - baseline)
toleranceFromExpectation :: Expectation -> (Double, Maybe Double) Source #
Extract tolerance description from an expectation for error messages. For compound expectations (And/Or), returns the first tolerance found.
toleranceValues :: Tolerance -> (Double, Maybe Double) Source #
Extract percentage and optional absolute tolerance from a Tolerance.
Re-exports
module Test.Hspec.BenchGolden.Arch
module Test.Hspec.BenchGolden.CSV
Orphan instances
| Example BenchGolden Source # | Instance for BenchGolden without arguments. | ||||
Associated Types
Methods evaluateExample :: BenchGolden -> Params -> (ActionWith (Arg BenchGolden) -> IO ()) -> ProgressCallback -> IO Result # | |||||
| Example (arg -> BenchGolden) Source # | Instance for BenchGolden with an argument. This allows benchmarks to receive setup data from | ||||
Associated Types
Methods evaluateExample :: (arg -> BenchGolden) -> Params -> (ActionWith (Arg (arg -> BenchGolden)) -> IO ()) -> ProgressCallback -> IO Result # | |||||