Safe Haskell | None |
---|---|
Language | GHC2021 |
AtCoder.Extra.AhoCorasick
Description
Aho–Corasick algorithm is a fast dictionary-matching (multi-pattern matching) algorithm.
Example
>>>
import AtCoder.Extra.AhoCorasick qualified as AC
>>>
import Data.Vector.Unboxed qualified as VU
Pattern strings must be given as V.Vector (VU.Vector Int)
:
>>>
let patterns = V.fromList [VU.fromList [0, 1], VU.fromList [0, 2], VU.fromList [2, 3, 4]]
>>>
let ac = AC.build patterns
>>>
AC.size ac
7
The automaton could be run manually with next
or nextN
:
>>>
AC.nextN ac {- empty node -} 0 (VU.fromList [0, 2, 3])
5
match
returns a vector of (endPos, patternId)
:
>>>
-- [.....) pattern 0
>>>
-- [.......) pattern2
>>>
AC.match ac $ VU.fromList [0, 1, 2, 3, 4]
[(2,0),(5,2)]
If you need a vector of (startPos, patternId)
, you must manually map the result:
>>>
let f (!end, !patId) = (end - VU.length (patterns V.! patId), patId)
>>>
-- [.....) pattern 0
>>>
-- [.......) pattern2
>>>
VU.map f . AC.match ac $ VU.fromList [0, 1, 2, 3, 4]
[(0,0),(2,2)]
Note that duplicate patterns are only counted once with match
.
Since: 1.5.3.0
Synopsis
- data AhoCorasick = AhoCorasick {}
- build :: HasCallStack => Vector (Vector Int) -> AhoCorasick
- size :: HasCallStack => AhoCorasick -> Int
- next :: HasCallStack => AhoCorasick -> Int -> Int -> Int
- nextN :: HasCallStack => AhoCorasick -> Int -> Vector Int -> Int
- match :: HasCallStack => AhoCorasick -> Vector Int -> Vector (Int, Int)
Documentation
data AhoCorasick Source #
Aho–Corasick algorithm data.
Since: 1.5.3.0
Constructors
AhoCorasick | |
Fields
|
Arguments
:: HasCallStack | |
=> Vector (Vector Int) | Pattern strings. |
-> AhoCorasick | Aho–Corasick automaton based on a trie. |
\(O(\sum_i |S_i|)\)
Constraints
- \(|S_i| > 0\)
Since: 1.5.3.0
size :: HasCallStack => AhoCorasick -> Int Source #
\(O(1)\) Returns the number of nodes in the trie.
Since: 1.5.3.0
Arguments
:: HasCallStack | |
=> AhoCorasick | The automaton. |
-> Int | Current node ID (empty node is |
-> Int | Character. |
-> Int | Next node ID. |
\(O(1)\) Retrieves the next node to visit.
Since: 1.5.3.0
Arguments
:: HasCallStack | |
=> AhoCorasick | The automaton. |
-> Int | Current node. |
-> Vector Int | String. |
-> Int | Resulting node. |
match :: HasCallStack => AhoCorasick -> Vector Int -> Vector (Int, Int) Source #
\(O(|T|)\) Runs dictionary matching (multi-pattern matching) in linear time and returns a list
of (endPos, patId)
, where [endPos - patLen, endPos)
corresponds to the interval of original
source slice.
Note that duplicate patterns are counted just once with one of them; if pattern A and B are the same, their appearence is counted as either A or B.
Since: 1.5.3.0