| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
AtCoder.Extra.AhoCorasick
Description
Aho–Corasick algorithm is a fast dictionary-matching (multi-pattern matching) algorithm.
Example
>>>import AtCoder.Extra.AhoCorasick qualified as AC>>>import Data.Vector qualified as V>>>import Data.Vector.Unboxed qualified as VU
Pattern strings must be given as V.Vector (VU.Vector Int):
>>>let patterns = V.fromList [VU.fromList [0, 1], VU.fromList [0, 2], VU.fromList [2, 3, 4]]>>>let ac = AC.build patterns>>>AC.size ac7
The automaton could be run manually with next or nextN:
>>>AC.nextN ac {- empty node -} 0 (VU.fromList [0, 2, 3])5
match returns a vector of (endPos, patternId):
>>>-- [.....) pattern 0>>>-- [.......) pattern2>>>AC.match ac $ VU.fromList [0, 1, 2, 3, 4][(2,0),(5,2)]
If you need a vector of (startPos, patternId), you must manually map the result:
>>>let f (!end, !patId) = (end - VU.length (patterns V.! patId), patId)>>>-- [.....) pattern 0>>>-- [.......) pattern2>>>VU.map f . AC.match ac $ VU.fromList [0, 1, 2, 3, 4][(0,0),(2,2)]
Note that duplicate patterns are only counted once with match.
Since: 1.5.3.0
Synopsis
- data AhoCorasick = AhoCorasick {}
- build :: HasCallStack => Vector (Vector Int) -> AhoCorasick
- size :: HasCallStack => AhoCorasick -> Int
- next :: HasCallStack => AhoCorasick -> Int -> Int -> Int
- nextN :: HasCallStack => AhoCorasick -> Int -> Vector Int -> Int
- match :: HasCallStack => AhoCorasick -> Vector Int -> Vector (Int, Int)
Documentation
data AhoCorasick Source #
Aho–Corasick algorithm data.
Since: 1.5.3.0
Constructors
| AhoCorasick | |
Fields
| |
Arguments
| :: HasCallStack | |
| => Vector (Vector Int) | Pattern strings. |
| -> AhoCorasick | Aho–Corasick automaton based on a trie. |
\(O(\sum_i |S_i|)\) Build an Aho–Corasick automaton.
Constraints
- \(|S_i| > 0\)
Since: 1.5.3.0
size :: HasCallStack => AhoCorasick -> Int Source #
\(O(1)\) Returns the number of nodes in the trie.
Since: 1.5.3.0
Arguments
| :: HasCallStack | |
| => AhoCorasick | The automaton. |
| -> Int | Current node ID (empty node is |
| -> Int | Character. |
| -> Int | Next node ID. |
\(O(1)\) Retrieves the next node to visit.
Since: 1.5.3.0
Arguments
| :: HasCallStack | |
| => AhoCorasick | The automaton. |
| -> Int | Current node. |
| -> Vector Int | String. |
| -> Int | Resulting node. |
match :: HasCallStack => AhoCorasick -> Vector Int -> Vector (Int, Int) Source #
\(O(|T|)\) Runs dictionary matching (multi-pattern matching) in linear time and returns a list
of (endPos, patId), where [endPos - patLen, endPos) corresponds to the interval of original
source slice.
Note that duplicate patterns are counted just once with one of them; if pattern A and B are the same, their appearence is counted as either A or B.
Since: 1.5.3.0