| Copyright | (c) Sebastian Tee 2023 |
|---|---|
| License | MIT |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Hlex
Contents
Synopsis
- type Grammar token = [GrammarRule token]
- data GrammarRule token
- type Lexer token = String -> Either LexException [token]
- data LexException
- hlex :: Grammar token -> Lexer token
Example
Here is an example module for a simple language.
module ExampleLang
( MyToken(..) -- Export the language's tokens and the lexer
, myLexer
) where
import Hlex
data MyToken = Ident String -- String identifier token
| Number Float -- Number token and numeric value
| Assign -- Assignment operator token
deriving(Show)
myGrammar :: Grammar MyToken
myGrammar = [ Error ""[^"]*n" "Can't have a new line in a string" -- Return Exception when a new line occurs in a string
, Tokenize ""[^"]*"" $ Str . init . tail -- Encode string and strip the containing quotes
, JustToken "=" Assign -- "=" Operator becomes the assign token
, Tokenize "[a-zA-Z]+" (match -> Ident match) -- Identifier token with string
, Tokenize "[0-9]+(\.[0-9]+)?" (match -> Number (read match) -- Number token with the parsed numeric value stored as a Float
, Skip "[ \n\r\t]+" -- Skip whitespace
]
myLexer :: Lexer MyToken
myLexer = hlex myGrammar -- hlex turns a Grammar into a Lexer
Here is the lexer being used on a simple program.
>>>lexer "x = 1.2"Right [Ident "x", Assign, Number 1.2]
Here is the lexer being used on an program with a syntax error.
>>>lexer "x = \"a\nb\""Left (MatchedException 1 5 "\"a\n" "Can't have a new line in a string")
The lexer uses Either. Right means the lexer successfully parsed the program to a list of MyTokens.
If Left was returned it would be a LexException.
Types
type Grammar token = [GrammarRule token] Source #
Lexical grammar made up of GrammarRules.
The order is important. The Lexer will apply each GrammarRule rule in the order listed.
data GrammarRule token Source #
These are the individual rules that make up a Grammar.
Takes a POSIX regular expression then converts it to a token or skips it.
Constructors
| Skip | Skips over any matches. |
Fields
| |
| Tokenize | Takes a function that converts the matched string to a token. |
| JustToken | Converts any regular expression matches to a given token. |
Fields
| |
| Error | Returns an error with a message when a match occurs. |
type Lexer token = String -> Either LexException [token] Source #
Converts a string into a list of tokens.
If the string does not follow the Lexer's Grammar a LexException will be returned.
Exceptions
data LexException Source #
Exception thrown when a Lexer encounters an error when lexxing a string.
Constructors
| UnmatchedException | Exception thrown when a substring cannot be matched. |
| MatchedException | Exception thrown when a macth is found on the |
Instances
| Read LexException Source # | |
| Show LexException Source # | |
| Eq LexException Source # | |
Defined in Hlex Methods (==) :: LexException -> LexException -> Bool Source # (/=) :: LexException -> LexException -> Bool Source # | |