bloomfilter-blocked: Fast, compact Bloom filters

[ apache, data, library ] [ Propose Tags ] [ Report a vulnerability ]

This library provides Bloom filters. A Bloom filter is a compact but probabilistic set-like data structure, supporting fast insert and membership operations.

Bloom filters are probabilistic in the sense that when querying for set membership, they can (with some probability) falsely report that an element is present when it is not present. On the other hand it will never falsely report that an element is not present when in fact it is present. That is, it can have false positives but not false negatives. The false positive rate (FPR) can be adjusted.

Bloom filters are compact, needing only a few bits per element. For example 10 bits per element is enough for a false positive rate (FPR) of 1%, and 15 bits for 0.1%.

The library includes two implementations of Bloom filters:

classic
in Data.BloomFilter.Classic: a default implementation that is faithful to the classical description of a Bloom filter data structure.
block-structured
in Data.BloomFilter.Blocked: a cache optimised representation that is faster, at the cost of needing a few more bits per element to achieve a target FPR.

Features of the library:

  • Fast. See the benchmark results.

  • Compact. It uses optimal sized bit arrays: no bigger than necessary.

  • Faster still: the block-structured Bloom filters are even faster, at the expense of needing more bits per entry.

  • Supports very large Bloom filters, bigger than 2^32 bits.

  • Simple API for specifying the size of Bloom filters.

  • Support for hash salting, for using Bloom filters with untrusted inputs.

  • Serialisation support with format version tracking.


[Skip to Readme]

library bloomfilter-blocked

library bloomfilter-blocked:xxhash

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1
Change log CHANGELOG.md
Dependencies base (>=4.16 && <4.22), bloomfilter-blocked, bytestring (>=0.11 && <0.13), deepseq (>=1.4 && <1.6), primitive (>=0.9 && <0.10) [details]
Tested with ghc ==9.2 || ==9.4 || ==9.6 || ==9.8 || ==9.10 || ==9.12
License Apache-2.0[multiple license files]
Copyright (c) 2023-2025 Cardano Development Foundation (c) 2025 Well-Typed LLP
Author Duncan Coutts, Joris Dral, Matthias Heinzel, Wen Kokke
Maintainer duncan@well-typed.com, joris@well-typed.com
Category Data
Source repo head: git clone https://github.com/well-typed/bloomfilter-blocked
this: git clone https://github.com/well-typed/bloomfilter-blocked(tag bloomfilter-blocked-0.1.0.1)
Uploaded by jdral at 2025-11-25T10:47:54Z
Distributions NixOS:0.1.0.0
Downloads 7 total (0 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2025-11-25 [all 1 reports]

Readme for bloomfilter-blocked-0.1.0.1

[back to package description]

bloomfilter-blocked

Hackage Build Haddocks

bloomfilter-blocked is a Haskell library providing multiple fast and efficient implementations of bloom filters. It is a full rewrite of the bloomfilter package, originally authored by Bryan O'Sullivan bos@serpentine.com.

A bloom filter is a space-efficient data structure representing a set that can be probablistically queried for set membership. The set membership query returns no false negatives, but it might return false positives. That is, if an element was added to a bloom filter, then a subsequent query definitely returns True. If an element was not added to a filter, then a subsequent query may still return True if False would be the correct answer. The probabiliy of false positives -- the false positive rate (FPR) -- is configurable.

The library includes two implementations of bloom filters: classic, and blocked.

  • Classic bloom filters, found in the Data.BloomFilter.Classic module: a default implementation that is faithful to the canonical description of a bloom filter data structure.

  • Blocked floom filters, found in the Data.BloomFilter.Blocked module: an implementation that optimises the memory layout of a classic bloom filter for speed (cheaper CPU cache reads), at the cost of a slightly higher FPR for the same amount of assigned memory.

The FPR scales inversely with how much memory is assigned to the filter. It also scales inversely with how many elements are added to the set. The user can configure how much memory is asisgned to a filter, and the user also controls how many elements are added to a set. Each implementation comes with helper functions, like sizeForFPR and sizeForBits, that the user can leverage to configure filters.

Both immutable (Bloom) and mutable (MBloom) bloom filters, including functions to convert between the two, are provided for each implementation. Note however that a (mutable) bloom filter can not be resized once created, and that elements can not be deleted once inserted.

For more information about the library and examples of how to use it, see the Haddock documentation of the different modules.

Usage notes

User should take into account the following:

  • This package is not supported on 32bit systems.

Differences from the bloomfilter package

The library is a full rewrite of the bloomfilter package, originally authored by Bryan O'Sullivan bos@serpentine.com. The main differences are:

  • bloomfilter-blocked supports both classic and blocked bloom filters, whereas bloomfilter only supports the former.
  • bloomfilter-blocked supports bloom filters of arbitrary sizes, whereas bloomfilter limits the sizes to powers of two.
  • bloomfilter-blocked supports sizes up to 2^48 for classic bloom filters and up to 2^41 for blocked bloom filters, instead of 2^32.
  • In bloomfilter-blocked, the Bloom and MBloom types are parameterised over a Hashable type class, instead of having a a -> [Hash] typed field. This separation in bloomfilter-blocked allows clean (de-)serialisation of filters as the hashing scheme is static.
  • bloomfilter-blocked uses XXH3 for hashing instead of Jenkins' lookup3, which bloomfilter uses.
  • The user can configure hash salts for improved security in bloomfilter-blocked, whereas this is not supported in bloomfilter.