Copyright	(C) 2026 - Eitan Chatav
License	BSD-style (see the file LICENSE)
Maintainer	Eitan Chatav <eitan.chatav@gmail.com>
Stability	provisional
Portability	non-portable
Safe Haskell	Safe-Inferred
Language	Haskell2010

Control.Lens.Grammar

Contents

Regular grammar
Context-free grammar
Context-sensitive grammar
Utility

Description

See Chomsky, On Certain Formal Properties of Grammars

Synopsis

type RegGrammar token a = forall p. (Lexical token p, Alternator p) => p a a
type Lexical token p = (forall x y. (x ~ (), y ~ ()) => TerminalSymbol token (p x y), forall x y. (x ~ token, y ~ token) => TokenAlgebra token (p x y)) :: Constraint
newtype RegString = RegString {
- runRegString :: RegEx Char
}
regstringG :: RegGrammar Char a -> RegString
regexGrammar :: Grammar Char RegString
type Grammar token a = forall p. (Lexical token p, forall x. BackusNaurForm (p x x), Alternator p) => p a a
newtype RegBnf = RegBnf {
- runRegBnf :: Bnf RegString
}
regbnfG :: Grammar Char a -> RegBnf
regbnfGrammar :: Grammar Char RegBnf
type CtxGrammar token a = forall p. (Lexical token p, forall x. BackusNaurForm (p x x), Alternator p, Filtrator p, Monadic p) => p a a
printG :: Cons string string token token => (IsList string, Item string ~ token, Categorized token) => (Alternative m, Monad m, Filterable m) => CtxGrammar token a -> a -> m (string -> string)
parseG :: (Cons string string token token, Snoc string string token token) => (IsList string, Item string ~ token, Categorized token) => (Alternative m, Monad m, Filterable m) => CtxGrammar token a -> string -> m (a, string)
unparseG :: (Cons string string token token, Snoc string string token token) => (IsList string, Item string ~ token, Categorized token) => (Alternative m, Monad m, Filterable m) => CtxGrammar token a -> a -> string -> m string
putStringLn :: (IsList string, Item string ~ Char) => string -> IO ()

Regular grammar

type RegGrammar token a = forall p. (Lexical token p, Alternator p) => p a a Source #

A regular grammar may be constructed using Lexical and Alternator combinators. Let's see an example using semantic versioning.

>>> import Numeric.Natural (Natural)
>>> :{
data SemVer = SemVer          -- e.g., 2.1.5-rc.1+build.123
  { major         :: Natural  -- e.g., 1
  , minor         :: Natural  -- e.g., 2
  , patch         :: Natural  -- e.g., 3
  , preRelease    :: [String] -- e.g., "alpha.1", "rc.2"
  , buildMetadata :: [String] -- e.g., "build.123", "20130313144700"
  }
  deriving (Eq, Ord, Show, Read)
:}

We'd like to define an optic _SemVer, corresponding to the constructor pattern SemVer. Unfortunately, we can't use TemplateHaskell to generate it in GHCi, which is used to test this documenation. Normally we would write makeNestedPrisms ''SemVer, but here is equivalent explicit Haskell code instead. Since SemVer is a newtype, _SemVer can be an Iso.

>>> :set -XRecordWildCards
>>> import Control.Lens (Iso', iso)
>>> :{
_SemVer :: Iso' SemVer (Natural, (Natural, (Natural, ([String], [String]))))
_SemVer = iso
  (\SemVer {..} -> (major, (minor, (patch, (preRelease, buildMetadata)))))
  (\(major, (minor, (patch, (preRelease, buildMetadata)))) -> SemVer {..})
:}

Now we can build a RegGrammar for SemVer using the "idiom" style of Applicative parsing with a couple modifications.

>>> :{
semverGrammar :: RegGrammar Char SemVer
semverGrammar = _SemVer
  >?  numberG
  >*< terminal "." >* numberG
  >*< terminal "." >* numberG
  >*< option [] (terminal "-" >* identifiersG)
  >*< option [] (terminal "+" >* identifiersG)
  where
    numberG = iso show read >~ someP (asIn @Char DecimalNumber)
    identifiersG = several1 (sepBy (terminal ".")) (someP charG)
    charG = asIn LowercaseLetter
      <|> asIn UppercaseLetter
      <|> asIn DecimalNumber
      <|> token '-'
:}

Instead of using the constructor SemVer with the Functor applicator <$>, we use the optic _SemVer we defined and the Choice applicator >?; although, we could have used the Profunctor applicator >~ instead, because _SemVer is an Iso. A few Alternative combinators like <|> work both Functorially and Profunctorially.

Functorial	Profunctorial
`SemVer`	`_SemVer`
`<$>`	`>?`
`*>`	`>*`
`<*`	`*<`
`<*>`	`>*<`
`<\|>`	`<\|>`
`option`	`option`
`choice`	`choice`
`many`	`manyP`
`some`	`someP`
`optional`	`optionalP`

You can generate a RegString from a RegGrammar with regstringG.

>>> putStringLn (regstringG semverGrammar)
\p{Nd}+(.\p{Nd}+(.\p{Nd}+((-((\p{Ll}|\p{Lu}|\p{Nd}|-)+(.(\p{Ll}|\p{Lu}|\p{Nd}|-)+)*))?(\+((\p{Ll}|\p{Lu}|\p{Nd}|-)+(.(\p{Ll}|\p{Lu}|\p{Nd}|-)+)*))?)))

You can also generate parsers and printers.

>>> [parsed | (parsed, "") <- parseG semverGrammar "2.1.5-rc.1+build.123"]
[SemVer {major = 2, minor = 1, patch = 5, preRelease = ["rc","1"], buildMetadata = ["build","123"]}]

Parsing unconses tokens left-to-right, from the beginning of a string. Unparsing, on the other hand, snocs tokens left-to-right, to the end of a string.

>>> unparseG semverGrammar (SemVer 1 0 0 ["alpha"] []) "SemVer: " :: Maybe String
Just "SemVer: 1.0.0-alpha"

Printing, on the gripping hand, conses tokens right-to-left, to the beginning of a string.

>>> ($ " is the SemVer.") <$> printG semverGrammar (SemVer 1 2 3 [] []) :: Maybe String
Just "1.2.3 is the SemVer."

Profunctorial combinators give us correct-by-construction invertible parsers. New RegGrammar generators can be defined with new instances of Lexical Alternators.

type Lexical token p = (forall x y. (x ~ (), y ~ ()) => TerminalSymbol token (p x y), forall x y. (x ~ token, y ~ token) => TokenAlgebra token (p x y)) :: Constraint Source #

Lexical combinators include

terminal symbols from Control.Lens.Grammar.Symbol;
Tokenized combinators from Control.Lens.Grammar.Token;
tokenClasses from Control.Lens.Grammar.Boole.

newtype RegString Source #

RegStrings are an embedded domain specific language of regular expression strings. Since they are strings, they have a string-like interface.

>>> let rex = fromString "ab|c" :: RegString
>>> putStringLn rex
ab|c
>>> rex
"ab|c"

RegStrings can be generated from RegGrammars with regstringG.

>>> regstringG (terminal "a" >* terminal "b" <|> terminal "c")
"ab|c"

RegStrings are actually stored as an algebraic datatype, RegEx.

>>> runRegString rex
RegExam (Alternate (Terminal "ab") (Terminal "c"))

RegStrings are similar to regular expression strings in many other programming languages. We can use them to see if a word and pattern are Matching.

>>> "ab" =~ rex
True
>>> "c" =~ rex
True
>>> "xyz" =~ rex
False

Like RegGrammars, RegStrings can use all the Lexical combinators. Unlike RegGrammars, instead of using Monoidal and Alternator combinators, RegStrings use Monoid and KleeneStarAlgebra combinators.

>>> terminal "a" <> terminal "b" >|< terminal "c" :: RegString
"ab|c"
>>> mempty :: RegString
""

Since RegStrings are a KleeneStarAlgebra, they support Kleene quantifiers.

>>> starK rex
"(ab|c)*"
>>> plusK rex
"(ab|c)+"
>>> optK rex
"(ab|c)?"

Like other regular expression languages, RegStrings support character classes.

>>> oneOf "abc" :: RegString
"[abc]"
>>> notOneOf "abc" :: RegString
"[^abc]"

The character classes are used for failure, matching no character or string, as well as the wildcard, matching any single character.

>>> zeroK :: RegString
"[]"
>>> anyToken :: RegString
"[^]"

Additional forms of character classes test for a character's GeneralCategory.

>>> asIn LowercaseLetter :: RegString
"\\p{Ll}"
>>> notAsIn Control :: RegString
"\\P{Cc}"

KleeneStarAlgebras support alternation >|<, and the Tokenized combinators are all negatable. However, we'd like to be able to take the intersection of character classes as well. RegStrings can combine characters' tokenClasses using BooleanAlgebra combinators.

>>> tokenClass (notOneOf "abc" >&&< notOneOf "xyz") :: RegString
"[^abcxyz]"
>>> tokenClass (oneOf "abcxyz" >&&< notOneOf "xyz") :: RegString
"[abc]"
>>> tokenClass (notOneOf "#$%" >&&< notAsIn Control) :: RegString
"[^#$%\\P{Cc}]"
>>> tokenClass (allB notAsIn [MathSymbol, Control]) :: RegString
"\\P{Sm|Cc}"
>>> tokenClass (notB (oneOf "xyz")) :: RegString
"[^xyz]"

Ill-formed RegStrings normalize to failure.

>>> fromString ")(" :: RegString
"[]"

Constructors

RegString
Fields runRegString :: RegEx Char

Instances

Instances details

IsString RegString Source #
Instance details Defined in Control.Lens.Grammar Methods fromString :: String -> RegString #
Monoid RegString Source #
Instance details Defined in Control.Lens.Grammar Methods mempty :: RegString # mappend :: RegString -> RegString -> RegString # mconcat :: [RegString] -> RegString #
Semigroup RegString Source #
Instance details Defined in Control.Lens.Grammar Methods (<>) :: RegString -> RegString -> RegString # sconcat :: NonEmpty RegString -> RegString # stimes :: Integral b => b -> RegString -> RegString #
IsList RegString Source #
Instance details Defined in Control.Lens.Grammar Associated Types type Item RegString # Methods fromList :: [Item RegString] -> RegString # fromListN :: Int -> [Item RegString] -> RegString # toList :: RegString -> [Item RegString] #
Read RegString Source #
Instance details Defined in Control.Lens.Grammar Methods readsPrec :: Int -> ReadS RegString # readList :: ReadS [RegString] # readPrec :: ReadPrec RegString # readListPrec :: ReadPrec [RegString] #
Show RegString Source #
Instance details Defined in Control.Lens.Grammar Methods showsPrec :: Int -> RegString -> ShowS # show :: RegString -> String # showList :: [RegString] -> ShowS #
KleeneStarAlgebra RegString Source #
Instance details Defined in Control.Lens.Grammar Methods starK :: RegString -> RegString Source # plusK :: RegString -> RegString Source # optK :: RegString -> RegString Source # (>\|<) :: RegString -> RegString -> RegString Source # zeroK :: RegString Source #
NonTerminalSymbol RegString Source #
Instance details Defined in Control.Lens.Grammar Methods nonTerminal :: String -> RegString Source #
Eq RegString Source #
Instance details Defined in Control.Lens.Grammar Methods (==) :: RegString -> RegString -> Bool # (/=) :: RegString -> RegString -> Bool #
Ord RegString Source #
Instance details Defined in Control.Lens.Grammar Methods compare :: RegString -> RegString -> Ordering # (<) :: RegString -> RegString -> Bool # (<=) :: RegString -> RegString -> Bool # (>) :: RegString -> RegString -> Bool # (>=) :: RegString -> RegString -> Bool # max :: RegString -> RegString -> RegString # min :: RegString -> RegString -> RegString #
Matching String RegString Source #
Instance details Defined in Control.Lens.Grammar Methods (=~) :: String -> RegString -> Bool Source #
TokenAlgebra Char RegString Source #
Instance details Defined in Control.Lens.Grammar Methods tokenClass :: TokenTest Char -> RegString Source #
TerminalSymbol Char RegString Source #
Instance details Defined in Control.Lens.Grammar Methods terminal :: [Char] -> RegString Source #
Tokenized Char RegString Source #
Instance details Defined in Control.Lens.Grammar Methods anyToken :: RegString Source # token :: Char -> RegString Source # oneOf :: Foldable f => f Char -> RegString Source # notOneOf :: Foldable f => f Char -> RegString Source # asIn :: Categorize Char -> RegString Source # notAsIn :: Categorize Char -> RegString Source #
type Item RegString Source #
Instance details Defined in Control.Lens.Grammar type Item RegString = Char

regstringG :: RegGrammar Char a -> RegString Source #

regstringG generates a RegString from a regular grammar. Since context-free Grammars and CtxGrammars aren't necessarily regular, the type system will prevent regstringG from being applied to them.

regexGrammar :: Grammar Char RegString Source #

regexGrammar is a context-free Grammar for RegStrings. It can't be a RegGrammar, since RegStrings include parenthesization. But balanced parentheses are a context-free language.

>>> putStringLn (regbnfG regexGrammar)
{start} = \q{regex}
{alternate} = \q{sequence}(\|\q{sequence})*
{atom} = (\\q\{)\q{char}*\}|\q{char}|\q{char-class}|\(\q{regex}\)
{category} = Ll|Lu|Lt|Lm|Lo|Mn|Mc|Me|Nd|Nl|No|Pc|Pd|Ps|Pe|Pi|Pf|Po|Sm|Sc|Sk|So|Zs|Zl|Zp|Cc|Cf|Cs|Co|Cn
{category-test} = (\\p\{)\q{category}\}|(\\P\{)(\q{category}(\|\q{category})*)\}
{char} = [^\(\)\*\+\?\[\\\]\^\{\|\}\P{Cc}]|\\\q{char-escaped}
{char-any} = \[\^\]
{char-class} = \q{fail}|\q{char-any}|\q{one-of}|\q{not-one-of}|\q{category-test}
{char-control} = NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|DEL|PAD|HOP|BPH|NBH|IND|NEL|SSA|ESA|HTS|HTJ|VTS|PLD|PLU|RI|SS2|SS3|DCS|PU1|PU2|STS|CCH|MW|SPA|EPA|SOS|SGCI|SCI|CSI|ST|OSC|PM|APC
{char-escaped} = [\(\)\*\+\?\[\\\]\^\{\|\}]|\q{char-control}
{expression} = \q{atom}\?|\q{atom}\*|\q{atom}\+|\q{atom}
{fail} = \[\]
{not-one-of} = (\[\^)\q{char}+(\q{category-test}?\])
{one-of} = \[\q{char}+\]
{regex} = \q{alternate}
{sequence} = \q{char}*|\q{expression}*

Context-free grammar

type Grammar token a = forall p. (Lexical token p, forall x. BackusNaurForm (p x x), Alternator p) => p a a Source #

Context-free Grammars add two capabilities to RegGrammars, coming from the BackusNaurForm interface

rule abstraction,
and general recursion.

regexGrammar and regbnfGrammar are examples of context-free Grammars. Regular expressions are a form of expression algebra. Let's see a similar but simpler example, the algebra of arithmetic expressions of natural numbers.

>>> import Numeric.Natural (Natural)
>>> :{
data Arith
  = Num Natural
  | Add Arith Arith
  | Mul Arith Arith
  deriving stock (Eq, Ord, Show, Read)
:}

Here are Prisms for the constructor patterns.

>>> import Control.Lens (Prism', prism')
>>> :{
_Num :: Prism' Arith Natural
_Num = prism' Num (\case Num n -> Just n; _ -> Nothing)
_Add, _Mul :: Prism' Arith (Arith, Arith)
_Add = prism' (uncurry Add) (\case Add x y -> Just (x,y); _ -> Nothing)
_Mul = prism' (uncurry Mul) (\case Mul x y -> Just (x,y); _ -> Nothing)
:}

Now we can build a Grammar for Arith by combining "idiom" style with named rules, and tying the recursive loop (caused by parenthesization) with ruleRec.

>>> :{
arithGrammar :: Grammar Char Arith
arithGrammar = ruleRec "arith" sumG
  where
    sumG arith = rule "sum" $
      chain1 Left _Add (sepBy (terminal "+")) (prodG arith)
    prodG arith = rule "product" $
      chain1 Left _Mul (sepBy (terminal "*")) (factorG arith)
    factorG arith = rule "factor" $
      numberG <|> terminal "(" >* arith *< terminal ")"
    numberG = rule "number" $
      _Num . iso show read >? someP (asIn @Char DecimalNumber)
:}

We can generate a RegBnf, printers and parsers from arithGrammar.

>>> putStringLn (regbnfG arithGrammar)
{start} = \q{arith}
{arith} = \q{sum}
{factor} = \q{number}|\(\q{arith}\)
{number} = \p{Nd}+
{product} = \q{factor}(\*\q{factor})*
{sum} = \q{product}(\+\q{product})*

>>> [x | (x,"") <- parseG arithGrammar "1+2*3+4"]
[Add (Add (Num 1) (Mul (Num 2) (Num 3))) (Num 4)]
>>> unparseG arithGrammar (Add (Num 1) (Mul (Num 2) (Num 3))) "" :: Maybe String
Just "1+2*3"
>>> do pr <- printG arithGrammar (Num 69); return (pr "") :: Maybe String
Just "69"

newtype RegBnf Source #

RegBnfs are an embedded domain specific language of Backus-Naur forms extended by regular expression strings. Like RegStrings they have a string-like interface.

>>> let bnf = fromString "{start} = foo|bar" :: RegBnf
>>> putStringLn bnf
{start} = foo|bar
>>> bnf
"{start} = foo|bar"

RegBnfs can be generated from context-free Grammars with regbnfG.

>>> :type regbnfG regbnfGrammar
regbnfG regbnfGrammar :: RegBnf

Like RegStrings, RegBnfs can be constructed using Lexical, Monoid and KleeneStarAlgebra combinators. But they also support BackusNaurForm rules and ruleRecs.

Constructors

RegBnf
Fields runRegBnf :: Bnf RegString

Instances

Instances details

IsString RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods fromString :: String -> RegBnf #
Monoid RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods mempty :: RegBnf # mappend :: RegBnf -> RegBnf -> RegBnf # mconcat :: [RegBnf] -> RegBnf #
Semigroup RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods (<>) :: RegBnf -> RegBnf -> RegBnf # sconcat :: NonEmpty RegBnf -> RegBnf # stimes :: Integral b => b -> RegBnf -> RegBnf #
IsList RegBnf Source #
Instance details Defined in Control.Lens.Grammar Associated Types type Item RegBnf # Methods fromList :: [Item RegBnf] -> RegBnf # fromListN :: Int -> [Item RegBnf] -> RegBnf # toList :: RegBnf -> [Item RegBnf] #
Read RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods readsPrec :: Int -> ReadS RegBnf # readList :: ReadS [RegBnf] # readPrec :: ReadPrec RegBnf # readListPrec :: ReadPrec [RegBnf] #
Show RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods showsPrec :: Int -> RegBnf -> ShowS # show :: RegBnf -> String # showList :: [RegBnf] -> ShowS #
BackusNaurForm RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods rule :: String -> RegBnf -> RegBnf Source # ruleRec :: String -> (RegBnf -> RegBnf) -> RegBnf Source #
KleeneStarAlgebra RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods starK :: RegBnf -> RegBnf Source # plusK :: RegBnf -> RegBnf Source # optK :: RegBnf -> RegBnf Source # (>\|<) :: RegBnf -> RegBnf -> RegBnf Source # zeroK :: RegBnf Source #
NonTerminalSymbol RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods nonTerminal :: String -> RegBnf Source #
Eq RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods (==) :: RegBnf -> RegBnf -> Bool # (/=) :: RegBnf -> RegBnf -> Bool #
Ord RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods compare :: RegBnf -> RegBnf -> Ordering # (<) :: RegBnf -> RegBnf -> Bool # (<=) :: RegBnf -> RegBnf -> Bool # (>) :: RegBnf -> RegBnf -> Bool # (>=) :: RegBnf -> RegBnf -> Bool # max :: RegBnf -> RegBnf -> RegBnf # min :: RegBnf -> RegBnf -> RegBnf #
Matching String RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods (=~) :: String -> RegBnf -> Bool Source #
TokenAlgebra Char RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods tokenClass :: TokenTest Char -> RegBnf Source #
TerminalSymbol Char RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods terminal :: [Char] -> RegBnf Source #
Tokenized Char RegBnf Source #
Instance details Defined in Control.Lens.Grammar Methods anyToken :: RegBnf Source # token :: Char -> RegBnf Source # oneOf :: Foldable f => f Char -> RegBnf Source # notOneOf :: Foldable f => f Char -> RegBnf Source # asIn :: Categorize Char -> RegBnf Source # notAsIn :: Categorize Char -> RegBnf Source #
type Item RegBnf Source #
Instance details Defined in Control.Lens.Grammar type Item RegBnf = Char

regbnfG :: Grammar Char a -> RegBnf Source #

regbnfG generates a RegBnf from a context-free Grammar. Since CtxGrammars aren't context-free, the type system will prevent regbnfG from being applied to a CtxGrammar. It can apply to a RegGrammar.

regbnfGrammar :: Grammar Char RegBnf Source #

regbnfGrammar is a context-free Grammar for RegBnfs. That means that it can generate a self-hosted definition.

>>> putStringLn (regbnfG regbnfGrammar)
{start} = \q{regbnf}
{alternate} = \q{sequence}(\|\q{sequence})*
{atom} = (\\q\{)\q{char}*\}|\q{char}|\q{char-class}|\(\q{regex}\)
{category} = Ll|Lu|Lt|Lm|Lo|Mn|Mc|Me|Nd|Nl|No|Pc|Pd|Ps|Pe|Pi|Pf|Po|Sm|Sc|Sk|So|Zs|Zl|Zp|Cc|Cf|Cs|Co|Cn
{category-test} = (\\p\{)\q{category}\}|(\\P\{)(\q{category}(\|\q{category})*)\}
{char} = [^\(\)\*\+\?\[\\\]\^\{\|\}\P{Cc}]|\\\q{char-escaped}
{char-any} = \[\^\]
{char-class} = \q{fail}|\q{char-any}|\q{one-of}|\q{not-one-of}|\q{category-test}
{char-control} = NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI|DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US|DEL|PAD|HOP|BPH|NBH|IND|NEL|SSA|ESA|HTS|HTJ|VTS|PLD|PLU|RI|SS2|SS3|DCS|PU1|PU2|STS|CCH|MW|SPA|EPA|SOS|SGCI|SCI|CSI|ST|OSC|PM|APC
{char-escaped} = [\(\)\*\+\?\[\\\]\^\{\|\}]|\q{char-control}
{expression} = \q{atom}\?|\q{atom}\*|\q{atom}\+|\q{atom}
{fail} = \[\]
{not-one-of} = (\[\^)\q{char}+(\q{category-test}?\])
{one-of} = \[\q{char}+\]
{regbnf} = (\{start\} = )\q{regex}(\LF\q{rule})*
{regex} = \q{alternate}
{rule} = \{\q{char}*(\} = )\q{regex}
{sequence} = \q{char}*|\q{expression}*

Context-sensitive grammar

type CtxGrammar token a = forall p. (Lexical token p, forall x. BackusNaurForm (p x x), Alternator p, Filtrator p, Monadic p) => p a a Source #

In addition to context-sensitivity via Monadic combinators, CtxGrammars adds general filtration via Filtrator to Grammars.

>>> :{
palindromeG :: CtxGrammar Char String
palindromeG = rule "palindrome" $
  satisfied (\wrd -> reverse wrd == wrd) >?< manyP (anyToken @Char)
:}

The satisfied pattern is used together with the Choice & Cochoice applicator >?< for general filtration. For context-sensitivity, the Monadic interface is used by importing Data.Profunctor.Monadic qualified and using a "bonding" notation which mixes "idiom" style with qualified do-notation. Let's use length-encoded vectors of numbers as an example.

>>> import Numeric.Natural (Natural)
>>> import Control.Lens.Iso (Iso', iso)
>>> :set -XRecordWildCards
>>> :{
data LenVec = LenVec {length :: Natural, vector :: [Natural]}
  deriving (Eq, Ord, Show, Read)
_LenVec :: Iso' LenVec (Natural, [Natural])
_LenVec = iso (\LenVec {..} -> (length, vector)) (\(length, vector) -> LenVec {..})
:}

>>> :set -XQualifiedDo
>>> import qualified Data.Profunctor.Monadic as P
>>> :{
lenvecGrammar :: CtxGrammar Char LenVec
lenvecGrammar = _LenVec >? P.do
  let
    numberG = iso show read >~ someP (asIn @Char DecimalNumber)
    vectorG n = intercalateP n (sepBy (terminal ",")) numberG
  len <- numberG             -- bonds to _LenVec
  terminal ";"               -- doesn't bond
  vectorG (fromIntegral len) -- bonds to _LenVec
:}

The qualified do-notation changes the signature of P.>>=, so that we must apply the constructor pattern _LenVec to the do-block with the >? applicator. Any bound named variable, var <- action, gets "bonded" to the constructor pattern. Any unbound actions, except for the last action in the do-block, does not get bonded to the pattern. The last action does get bonded to the pattern. Any unnamed bound action, _ <- action, also gets bonded to the pattern, but being unnamed means it isn't added to the context. If all bound actions are unnamed, then a CtxGrammar can be rewritten as a Grammar since it is context-free. We can't generate a RegBnf since the rules of a CtxGrammar aren't static, but dynamic and contextual. We can generate parsers and printers as expected.

>>> [vec | (vec, "") <- parseG lenvecGrammar "3;1,2,3"] :: [LenVec]
[LenVec {length = 3, vector = [1,2,3]}]
>>> [vec | (vec, "") <- parseG lenvecGrammar "0;1,2,3"] :: [LenVec]
[]
>>> [pr "" | pr <- printG lenvecGrammar (LenVec 2 [6,7])] :: [String]
["2;6,7"]
>>> [pr "" | pr <- printG lenvecGrammar (LenVec 200 [100])] :: [String]
[]
>>> [pal | word <- ["racecar", "word"], (pal, "") <- parseG palindromeG word]
["racecar"]

printG Source #

Arguments

:: Cons string string token token
=> (IsList string, Item string ~ token, Categorized token)
=> (Alternative m, Monad m, Filterable m)
=> CtxGrammar token a
-> a	syntax
-> m (string -> string)

printG generates a printer from a CtxGrammar. Since both RegGrammars and context-free Grammars are CtxGrammars, the type system will allow printG to be applied to them. Running the printer on a syntax value returns a function that conses tokens at the beginning of an input string, from right to left.

parseG Source #

Arguments

:: (Cons string string token token, Snoc string string token token)
=> (IsList string, Item string ~ token, Categorized token)
=> (Alternative m, Monad m, Filterable m)
=> CtxGrammar token a
-> string	input
-> m (a, string)

parseG generates a parser from a CtxGrammar. Since both RegGrammars and context-free Grammars are CtxGrammars, the type system will allow parseG to be applied to them. Running the parser on an input string value unconses tokens from the beginning of an input string from left to right, returning a syntax value and the remaining output string.

unparseG Source #

Arguments

:: (Cons string string token token, Snoc string string token token)
=> (IsList string, Item string ~ token, Categorized token)
=> (Alternative m, Monad m, Filterable m)
=> CtxGrammar token a
-> a	syntax
-> string	input
-> m string

unparseG generates an unparser from a CtxGrammar. Since both RegGrammars and context-free Grammars are CtxGrammars, the type system will allow unparseG to be applied to them. Running the unparser on a syntax value and an input string snocs tokens at the end of the string, from left to right, returning the output string.

Utility

putStringLn :: (IsList string, Item string ~ Char) => string -> IO () Source #

putStringLn is a utility that generalizes putStrLn to string-like interfaces such as RegString and RegBnf.