[!Note]
Note: This project was built almost entirely with AI; see
How this was built for the prompts.
A human did, however, read over the readme and find it acceptable.
tasty-cache
A Tasty ingredient that skips
tests whose source hasn't changed since the last passing run, using GHC HIE
files for fine-grained dependency tracking.
Quick start
1. Emit HIE files — add to both your library and test-suite
stanzas in your .cabal file:
library
-- ... your other fields ...
ghc-options: -fwrite-ide-info -hiedir .hie
test-suite tests
-- ... your other fields ...
ghc-options: -fwrite-ide-info -hiedir .hie
Both stanzas need these flags because the cache reads the HIE files for your
library modules (to follow dependency chains) and your test module (to find
the test body source).
2. Add to .gitignore:
.hie/
.cache/
3. Replace defaultMain and wrap the test groups you want cached:
import Test.Tasty.HieCache (defaultMainWithHieCache, cacheable)
main :: IO ()
main = defaultMainWithHieCache tests
tests :: TestTree
tests = testGroup "all"
[ cacheable $ testGroup "pure unit tests"
[ testCase "add 1 2 == 3" $ add 1 2 @?= 3
, testCase "factorial 5" $ factorial 5 @?= 120
]
, testGroup "integration tests" -- no cacheable → always runs
[ testCase "..." $ ...
]
]
cacheable works on any TestTree — testGroup, testCase, testProperty
(QuickCheck), testSpec (Hspec), or any other Tasty provider. Wrap at
whatever granularity makes sense.
Only tests wrapped with cacheable are ever skipped. Unwrapped tests run
unconditionally on every invocation, making cacheable safe to omit for tests
with side-effects, network access, or flaky behaviour.
Why is caching opt-in? Integration tests, database tests, and other
effectful tests should always run — their correctness depends on external
state, not just source bytes. cacheable is a deliberate signal that a test
is pure and repeatable.
Requires GHC >= 9.4 (tested on 9.4, 9.6, 9.8, 9.10, 9.12, 9.14).
What if I forget the flags?
If the .hie directory doesn't exist, the ingredient logs a warning and runs
all tests normally — no crash, no silent skipping. You'll see:
HieCache: no .hie directory, running all tests
If fingerprinting fails for any other reason (unreadable HIE file, parse
error), the ingredient falls back to running all tests and logs the error to
stderr.
Nix
This repo is a flake that exposes tasty-cache as a nixpkgs-idiomatic
Haskell package, derived directly from the cabal file, built and tested
against every supported GHC version (9.4, 9.6, 9.8, 9.10, 9.12,
9.14).
Consume from another flake
Add tasty-cache as an input and apply its overlay; the library is
injected into every supported pkgs.haskell.packages.ghc<v> set, so you
can pick whichever GHC your project targets and pull tasty-cache in
with ghcWithPackages like any other Haskell dependency:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
tasty-cache.url = "github:silky/tasty-cache";
};
outputs = { self, nixpkgs, tasty-cache }:
let
system = "x86_64-linux";
pkgs = import nixpkgs {
inherit system;
overlays = [ tasty-cache.overlays.default ];
};
ghc = pkgs.haskell.packages.ghc910; # or ghc94/96/98/912/914
in
{
packages.${system}.example =
ghc.ghcWithPackages (p: [ p.tasty-cache ]);
};
}
The same overlay also lets your own callCabal2nix / callPackage-based
Haskell builds depend on tasty-cache by name.
Build and develop locally
nix build # build the library against the default GHC
nix develop # dev shell with cabal-install, hiedb, and every
# Haskell dep needed for the library + test-suite
nix flake check # treefmt + build & run the test-suite on every
# supported GHC version (the full test matrix)
nix fmt # format Nix and Haskell sources
Inside nix develop:
cabal build
cabal test
flake.nix exposes per-GHC outputs:
nix build .#tasty-cache-ghc94 # or -ghc96, -ghc98, -ghc910, -ghc912, -ghc914
nix build .#checks.x86_64-linux.tasty-cache-ghc910-tests
The package shipped via the overlay is wrapped with dontCheck so
consumers don't pay the test cost transitively; coverage is preserved by
the per-version tasty-cache-ghc<v>-tests derivations under checks.
How it works
GHC can emit HIE (Haskell Interface Extended) files — binary files
containing the full typed AST of each compiled module, including the source
bytes and a record of every identifier's definition site and every use site.
tasty-cache reads these files to compute a fingerprint for each
cacheable test:
fingerprint = hash(body_hash, dep_hash, cabal_hash)
body_hash = hash of the testCase expression's source bytes
dep_hash = hash of the source bytes of every declaration transitively
reachable from the test body via the HIE identifier graph
cabal_hash = hash of all .cabal files in the project root
The transitive dependency set is computed by BFS over the HIE identifier
graph, starting from the names used in the test body and following Use
references through every reachable declaration in every library module.
On each run the ingredient compares fingerprints against a cache
(.cache/hie-tasty-cache). Tests whose fingerprint is unchanged are replaced
with an instant-pass placeholder (OK (cached)); only stale tests execute.
The cache is updated per-test as each passing test completes, so a
partially-failing run still advances the cache for the tests that passed.
Output
First run — cache is empty, all tests execute:
scenarios
Lib (basic direct dependency)
add 1 2 == 3: OK
add 0 0 == 0: OK
factorial 5 == 120: OK
Parity (mutual recursion — always runs)
isEven 0: OK
...
All 45 tests passed (0.02s)
Second run — nothing changed; cacheable groups are served from cache,
unwrapped groups run again:
HieCache: skipping 20 cached test(s)
scenarios
Lib (basic direct dependency)
add 1 2 == 3: OK (cached)
add 0 0 == 0: OK (cached)
factorial 5 == 120: OK (cached)
Parity (mutual recursion — always runs)
isEven 0: OK
isEven 4: OK
...
Expr (GADT)
eval Lit: OK (cached)
...
Diamond (transitive deps)
base 5 == 6: OK (cached)
...
Arithmetic (Template Haskell — always runs)
add5 3 == 8: OK
...
All 45 tests passed (0.00s)
After editing factorial — only the factorial test re-runs within the
cacheable group; add tests remain cached:
HieCache: skipping 19 cached test(s)
scenarios
Lib (basic direct dependency)
add 1 2 == 3: OK (cached)
add 0 0 == 0: OK (cached)
factorial 5 == 120: OK
What gets invalidated
The dep hash covers the transitive closure of the HIE identifier graph:
| Change |
Tests that re-run |
Edit factorial |
Tests that call factorial (directly or transitively) |
Edit add |
Tests that call add; factorial tests are unaffected |
Edit base in a diamond dependency |
All tests depending on base, partA, partB, combined |
Edit isEven |
isEven tests and isOdd tests (since isOdd calls isEven) |
Edit a TH template body (adderExpr) |
Tests using that splice (add5, add10) but not others (timesBy3) |
Change #define SCALE_FACTOR |
All tests in that CPP module (whole-file hashed) |
Add/remove a {-# LANGUAGE #-} pragma |
All tests that transitively depend on that module |
Edit a .cabal file |
All cacheable tests (cabal hash covers default-extensions etc.) |
| Edit an unrelated function |
Nothing — those tests stay cached |
Any change to a non-cacheable test |
That test always runs anyway |
Disabling the cache
To force every test to run — ignoring all cached results and cacheable
labels — pass --disable-tasty-cache:
cabal test --test-options="--disable-tasty-cache"
You'll see:
HieCache: caching disabled, running all tests
This is useful for CI jobs that must run the full suite, or when you suspect
a stale cache and want a clean baseline without deleting .cache/hie-tasty-cache.
Advanced usage
If you are composing Tasty ingredients manually (e.g. alongside tasty-rerun
or a custom reporter), use hieCacheIngredient directly instead of
defaultMainWithHieCache:
import Test.Tasty.HieCache (hieCacheIngredient)
import Test.Tasty.Runners (defaultIngredients)
main :: IO ()
main = defaultMainWithIngredients myIngredients tests
where
myIngredients =
hieCacheIngredient defaultIngredients
: defaultIngredients
++ [myCustomIngredient]
hieCacheIngredient takes the list of sub-ingredients it should delegate
actual test execution to. Pass your full ingredient list so that all normal
Tasty behaviour (parallel execution, filtering with -p, XML output, etc.)
continues to work.
Caveats for real-world projects
occName collision (the most common false positive)
Dependencies are matched by occurrence name — the bare string "show",
"==", "compare", "fmap" — rather than by GHC's fully-qualified Name
unique. This means every module that defines a binding with the same short
name contributes to the dep map, and the BFS follows all of them.
In practice: any project using deriving Show, Eq, Ord, or Functor
will see over-broad invalidations. Adding deriving Show to a new type
anywhere in the project causes the BFS to follow "show" into that module too,
and tests that transitively call show on any type will be re-run
unnecessarily.
The fingerprints are still correct (no false negatives — a test never
wrongly stays cached), but the cache hit rate may be lower than expected in
projects with many derived instances.
HLS interaction
HLS (Haskell Language Server) also writes HIE files to the .hie directory.
HIE files are deterministic for a given source file and set of flags, so in
normal usage HLS and cabal test produce identical files and there is no
conflict.
However, if HLS is configured with different ghc-options than the
test-suite stanza (e.g. HLS omits -O, or uses a different set of language
extensions via haskell-language-server.json), the HIE files written by HLS
may differ from those produced by cabal test, causing fingerprints to be
computed against stale AST data. If you observe unexpected cache misses or
hits, check that both HLS and cabal test use the same flags.
Parallel test runs
tasty-cache updates the in-memory cache with modifyIORef' as each test
passes, then writes it to disk once at the end. If Tasty runs tests in
parallel (the default), two tests passing concurrently will race on the
IORef — each reads the current map, adds its key, and writes back, and one
write can overwrite the other's entry. The affected tests will simply re-run
on the next invocation rather than being cached. The cache is never wrong
as a result, only incomplete.
Separately, running two cabal test processes concurrently (e.g. in a CI
matrix) will race on the cache file on disk; the last write wins.
GHC version compatibility
Tested on GHC 9.4, 9.6, 9.8, 9.10, 9.12, and 9.14 — the full nix-built
matrix is exercised by nix flake check (see Nix above). The
implementation imports GHC.Iface.Ext.* and GHC.Types.*, which are
internal GHC APIs with no stability guarantee, but in practice the
specific symbols used here have been stable across the entire 9.4 → 9.14
range. The last breaking rename in this surface area was
HieTypes.nodeInfo → GHC.Iface.Ext.Types.sourcedNodeInfo between GHC
8.10 and 9.0; a similar rename in a future release would re-break things.
Cache location
.cache/hie-tasty-cache — a plain-text file. Safe to delete at any time;
deleting it causes all cacheable tests to run on the next invocation.
Test scenarios
The bundled test suite (test/Main.hs) contains 45 tests across 7 modules,
demonstrating the range of dependency patterns the cache handles:
| Module |
cacheable? |
What it demonstrates |
Lib |
yes |
Basic direct dep — editing factorial doesn't invalidate add tests |
Parity |
no |
Always runs; mutual recursion — isEven/isOdd call each other |
Expr |
yes |
GADT — eval and pretty are independent; editing one doesn't invalidate the other |
Diamond |
yes |
Diamond deps — combined → partA/partB → base; editing base invalidates all four |
Arithmetic |
no |
Always runs; Template Haskell — splice dependency tracking |
CPPDemo |
no |
Always runs; CPP #define changes caught via whole-file hashing |
FalseNegatives |
no |
Demonstrates false-negative scenarios in the caching logic (see below) |
Known limitations
False negatives (tests skip when they should run)
The FalseNegatives test module (test/FalseNegatives.hs) contains unit tests
that demonstrate each of the scenarios below. Run cabal test to see them.
Missing fingerprint treated as cached (fixed). Previously, if a test's
name could not be located in the HIE source (dynamically constructed names,
unusual formatting, or leafMap collision — see below), its fingerprint was
absent. Since an absent fingerprint compared equal to an absent cache entry
(Nothing /= Nothing is False), the test was treated as cached and never ran
— not even on the very first invocation. This has been fixed: tests with no
computable fingerprint are now always treated as stale and run unconditionally.
findExprEnd stops at blank lines. The indentation heuristic that
determines where a testCase expression ends treats a blank line as a
terminator. A multi-line do-block test with an internal blank line will have
its body hash computed only up to that blank line; edits after it are invisible
to the cache. This also affects dependency tracking in library functions: if a
function definition contains a blank line, identifiers used after it may not be
followed by the BFS, so changes to those transitive dependencies can go
undetected.
Top-level helpers in the test module are not tracked. The entire test
module is excluded from the BFS to avoid including test bodies as their own
dependencies. If Main.hs defines a helper used by tests, changing it does
not invalidate those tests.
Multi-line pragmas only partially captured. The pragma-line detector
matches lines beginning with {-#. A pragma written across multiple lines has
its continuation lines omitted from the hash.
False positives (tests run when they don't need to)
occName collision across modules — see Caveats for real-world
projects above.
GeneratedInfo nodes included. The HIE SourcedNodeInfo structure
distinguishes user-written source (SourceInfo) from generated code
(GeneratedInfo; derived instances, TH splices). The implementation currently
treats both identically, so generated bindings pollute the dep map and may
cause unnecessary invalidations.
GHC internals coupling
Internal GHC API. The implementation imports GHC.Iface.Ext.* and
GHC.Types.*, which are not stable public APIs. The last break in the
specific surface used here was between GHC 8.10 and 9.0 (nodeInfo →
sourcedNodeInfo, plus the move from HieTypes to
GHC.Iface.Ext.Types); subsequent breaks at 9.2 → 9.4 affected
initNameCache and readHieFile. Within 9.4+ the surface has been
stable, but a future release could break it again.
hie_hs_src vs post-CPP spans. For CPP modules, hie_hs_src stores the
raw pre-CPP source while HIE AST spans refer to the post-CPP source. With
#if/#ifdef blocks the line numbers can diverge. The current whole-file
hashing sidesteps this for simple #define cases only.
ValBind node span is undocumented. The code assumes the HIE node
carrying a ValBind identifier has a span covering the full equation. This is
true in GHC 9.8 but is an implementation detail with no documented guarantee.
Architecture
Duplicate test names. Two tests in different groups with the same leaf
name collide in leafMap; only one fingerprint is computed. Since the
staleness fix, the other test now runs unconditionally on every invocation
(rather than being silently cached forever), but it never benefits from
caching.
String-search test location. A test's source position is found by
searching for its quoted name in hie_hs_src. A test named "error" matches
the first occurrence of "error" anywhere in the file.
How this was built
This project was developed interactively with Claude. The prompts that produced
it, in order:
(Human's note: I only started tracking the prompts after a few initial
iterations; but hopefully how I started is clear to you; just basically "Can
you write me a nix-style caching mechanism for test function dependencies,
based on HIE files." Credit to @gacafe for
suggesting this approach to AI transparency.)
-
Can you fix the compile-time errors and check that your implementation does
the correct thing — i.e. caches the results of tests whose dependent
functions do not have AST changes. It will work if you can change factorial
and see that the other two tests are CACHED, and not re-evaluated.
-
Can you update the readme now and make sure it is accurate?
-
Can you now try and come up with some very interesting and complex
dependency tree scenarios, and test them? I'm thinking about at least
interesting source code dependencies; but also functions that involve
Template Haskell, CPP, GADTs.
-
What's the fix?
-
Is it at all possible to get the cached output to render as "OK (cached)"
instead of "cached" on a subsequent line?
-
Okay; and you can confirm that this implementation fixes every bug you
observed above?
-
Can you also do a test to check that adding or removing an extension
re-runs either only tests that would be affected, or at least all the tests
in the relevant file?
-
Okay. I would like you now to take an extremely close look, taking the
perspective of a core contributor to the GHC project, and reflect upon any
limitations in this implementation. Take your time, and think it through
from many different perspectives.
-
Can you think of a nice name for this project?
-
Can you rename this project to tasty-cache.
-
Can you update the README now to reflect the current output and state of
the project? Please also take care to document known limitations. Also, can
you provide a section at the end, that lists all the prompts I typed into
Claude in order to get it to this state?
-
Can you make sure that the cache is invalidated for the entire source tree
if a (new) default extension is added in the cabal file. Can you test this?
(Manually; you don't need to write a test for it.)
-
Can you now make this an opt-in ability; i.e. the tests that you want
cached must be wrapped with a certain cacheable function? Then demonstrate
this in action.
-
Cool; can you make sure the documentation is up to date with this
information?
-
Can you fix the warnings from nix fmt?
-
Again, make sure the readme is up to date and shows how to use the
features of this library really cleanly.
-
Are there any issues you think Haskellers will have using this library?
Can you think of anything confusing to either extremely experienced
Haskellers, and complete beginners? Reflect on this situation, and think
about what would need to change, and/or add some explanations to the
Readme.
-
Can you make sure the CHANGELOG is representative of the features actually
present in the first version?
-
Any final changes you'd like to make before we release the first version?
If not, make sure the readme contains the final list of inputs to claude.
-
Can you add a command-line option that will disable caching entirely, for
every test, whether or not it is labelled with cacheable?
-
Please don't require me to set --no-hie-cache=True; just make the flag
called --disable-tasty-cache
-
Can you document this option in the README?
-
Can you perform a careful review of the code? Take the perspective of a
Haskeller who is concerned that this might result in false-negatives; i.e.
not running a test that needs to be re-run. Take some time to convince
yourself that this can never happened; or, add some tests to show when and
how it does happen.
-
Plan looks good; please just also update the README once you're done.
-
Can you reorganise the flake.nix so that it builds a nixpkgs-idiomatic
Haskell package from the cabal file? I should be possible to exclude it in
a nix project as a typical ghc.withPackages ... dependency.
Please also keep the ability to develop the package inside a nix shell
that has cabal, and all the required packages.
Please also update the readme accordingly.
-
Can you have a think about what's required to support different versions
of GHC?
If the HIE format changes; I suggest having a CPP-style setup of
different steps per GHC version; and then depending on which one is
targetted follow that particular subset of the overall logic.
Investigate a few common GHC versions, as well as the latest release, and
formulate a plan for accomodating multiple versions in the one library.
-
Just implement "Floor A"; add the checks into the flake.nix, but don't
add any GitHub actions.
Please do update the README.