heph-sparse-set: Really fast mutable sparse sets

[ bsd3, data-structures, library ] [ Propose Tags ] [ Report a vulnerability ]

Please see the README on GitHub at https://github.com/jtnuttall/heph/tree/main/heph-sparse-set#readme

[Skip to Readme]

Modules

[Index] [Quick Jump]

Data
- SparseSet
  - Generic
    - Data.SparseSet.Generic.Mutable
      - Internal
        Data.SparseSet.Generic.Mutable.Internal.GrowVec
        Data.SparseSet.Generic.Mutable.Internal.MutableSparseArray
  - Data.SparseSet.Mutable
  - Storable
    - Data.SparseSet.Storable.Mutable
  - Unboxed
    - Data.SparseSet.Unboxed.Mutable

Downloads

heph-sparse-set-0.1.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

jtnuttall

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0
Change log	CHANGELOG.md
Dependencies	base (>=4.7 && <5), deepseq (>=1.4 && <1.6), primitive (>=0.7 && <0.10), vector (>=0.12.3.0 && <0.14) [details]
Tested with	ghc ==9.8.4 \|\| ==9.6.5 \|\| ==8.10.7
License	BSD-3-Clause
Copyright	2025 Jeremy Nuttall
Author	Jeremy Nuttall
Maintainer	jeremy@jeremy-nuttall.com
Category	Data Structures
Home page	https://github.com/jtnuttall/heph/tree/main/heph-sparse-set#readme
Bug tracker	https://github.com/jtnuttall/heph/issues
Source repo	head: git clone https://github.com/jtnuttall/heph
Uploaded	by jtnuttall at 2025-06-08T21:46:42Z
Distributions
Downloads	4 total (4 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for heph-sparse-set-0.1.0.0

[back to package description]

heph-sparse-set

A highly-performant, mutable sparse set implementation for Haskell.

This library is part of the forthcoming Hephaestus rendering system, but is sufficiently general purpose for any application that needs its particular behavior.

This library provides a mutable sparse set data structure designed for high-throughput systems where predictable, constant-time operations are critical. It is particularly well-suited for the performance demands of applications like game development (especially in ECS architectures), simulations, and other scenarios managing collections of integer-keyed data.

The design guarantees O(1) amortized complexity for core operations by leveraging direct array indexing and cache-friendly memory layouts.

Features

Algorithmic Guarantees: Provides amortized O(1) complexity for all core operations: insert, delete, and lookup.
Cache-Locality & Efficiency: Unboxed and Storable implementations are built to ensure contiguous memory layout and minimal pointer indirection.
Monad-Agnostic Interface: Operates within any PrimMonad context, enabling its use in both IO and pure ST computations without performance degradation.
Efficient Iteration: Offers iterators (mapM_, ifoldM) that operate directly on the underlying contiguous storage, avoiding the allocation of intermediate lists.
Tested for Correctness: The implementation is validated by a comprehensive test suite.

Design & Implementation

At its core, heph-sparse-set is implemented with three internal vectors to provide its performance characteristics:

A dense vector that stores the component data itself in a tightly packed array. Iteration happens over this array.
An indices vector, also dense, that stores the entity ID for each element in the dense vector.
A sparse vector that maps entity IDs directly to their location in the dense arrays. A lookup is a simple O(1) check in this vector.

When an element is removed, the last element from the dense arrays is swapped into the deleted element's slot (a "swap-and-pop"). This maintains the packed nature of the dense storage and ensures the O(1) time complexity for removals.

Performance Characteristics

The full benchmark suite can be found in the bench directory and run with cabal bench or stack bench.

The following results compare heph-sparse-set against Data.IntMap.Strict for 100,000 elements, using the default GHC garbage collector on a quiet system. The results are highly stable (R² > 0.99) and demonstrate the significant performance advantage of heph-sparse-set for its target workloads.

Operation	`heph-sparse-set`	`Data.IntMap.Strict`	Advantage	Speedup Factor
Get (Existing)	138 μs	3,749 μs	`heph-sparse-set`	~27x
Contains (Existing)	111 μs	3,788 μs	`heph-sparse-set`	~34x
Update (Dense)	168 μs	11,290 μs	`heph-sparse-set`	~67x
Insert (Dense, Asc)	1.86 ms	5.44 ms	`heph-sparse-set`	~2.9x
Remove (Dense)	1.02 ms	1.81 ms	`heph-sparse-set`	~1.8x
Intersection (50%)	157 μs	1,177 μs	`heph-sparse-set`	~7.5x
Mixed Workload	868 μs	10,670 μs	`heph-sparse-set`	~12x
Insert (Sparse, Asc)	77.23 ms	7.67 ms	`Data.IntMap.Strict`	~10x

Analysis

Lookups and Updates: The core strength of heph-sparse-set is its true O(1) complexity for lookups, updates, and containment checks. The ~27x to ~67x speedup is a direct result of simple array indexing versus the O(log n) tree traversal required by IntMap.
Dense Workloads: For densely packed entity IDs, heph-sparse-set is significantly faster across all operations. The ~2.9x speedup for dense insertions highlights the efficiency of amortized O(1) appends to contiguous vectors over the allocations and rebalancing of a tree structure.
Iteration and Cache Performance: The library's advantage in iteration-heavy tasks like Intersection and the Mixed Workload showcases the benefit of its cache-friendly memory layout. Iterating over the internal dense arrays is significantly faster than the pointer chasing required to traverse the nodes of an IntMap.
The Sparse Insertion Trade-off: The table clearly shows the primary trade-off. Data.IntMap.Strict is the superior choice for workloads dominated by sparse, ascending key insertions, where it performs up to 10x faster. This is because each such insert in heph-sparse-set can trigger a costly reallocation of the internal sparse array. Interestingly, insertions in descending sparse order are much faster in heph-sparse-set (~14.6 ms) because the sparse array is allocated once to its maximum required size and then filled, avoiding repeated reallocations.

Usage

Here's a basic usage example with Unboxed sets in IO.

import Data.SparseSet.Unboxed.Mutable qualified as SS

-- An entity in our system is just an Int
type Entity = Int
type Position = (Int, Int)

main :: IO ()
main = do
  -- Create a new sparse set
  positions <- SS.new @Position

  -- Insert some components
  SS.insert positions 10 (5, 5)   -- Entity 10 has position (5, 5)
  SS.insert positions 42 (1, 2)   -- Entity 42 has position (1, 2)
  SS.insert positions 3  (9, -4)

  -- Look up a component
  maybePos <- SS.lookup positions 42
  putStrLn $ "Position of entity 42: " <> show maybePos -- Just (1,2)

  -- Overwrite an existing component
  SS.insert positions 10 (6, 6)

  -- Delete a component
  deletedVal <- SS.delete positions 3
  putStrLn $ "Deleted component for entity 3: " <> show deletedVal -- Just (9,-4)

  -- Check for existence
  has42 <- SS.contains positions 42 -- True
  putStrLn $ "Set contains 42: " <> show has42
  has3  <- SS.contains positions 3  -- False
  putStrLn $ "Set contains 3: " <> show has3

  -- Efficiently iterate over all (entity, component) pairs
  putStrLn "Current members:"
  SS.imapM_ (\(entity, pos) -> print (entity, pos)) positions
  -- Expected output (order may vary):
  -- (10,(6,6))
  -- (42,(1,2))

Considerations

Memory Consumption

In general use, the memory consumption of this implementation will grow relative to the maximum key in the set. It is advisable to manage memory consumption deliberately:

Use the compact method periodically to reduce memory consumption. This is an expensive operation, so a reasonable heuristic would need to be established for your specific system.
Use a generational entity ID, so that entity ID growth can be constrained to a reasonable degree.

Concurrency

This library is intentionally not thread-safe. For concurrent use, thread safety must be provided externally.

Backwards Compatibility

I took some pains to ensure that this library is reasonably backwards-compatible. However, I'm intentionally designing Hephaestus to use GHC2021, so while this library doesn't use GHC2021, it does depend on modern language extensions that constrain its compatibility.

Please note that you may experience performance degradation on GHC < 9.2

Minimum supported GHC: 8.10.4
Minimum supported Stackage snapshot: lts-18.0

Roadmap

Compacted immutable sparse sets with read-only semantics for storage
Atomic operations or wrappers