
Microformats 2 parser for Haskell! #IndieWeb
Originally created for sweetroll.
- parses
items
, rels
, rel-urls
- resolves relative URLs (with support for the
<base>
tag), including inside of html
for e-*
properties
- parses the value-class-pattern, including date and time normalization
- handles malformed HTML (the actual HTML parser is tagstream-conduit)
- also can convert to JF2
- high performance
- extensively tested
Also check out http-link-header because you often need to read links from the Link header!
Usage
Look at the API docs on Hackage for more info, here's a quick overview:
{-# LANGUAGE OverloadedStrings #-}
import Data.Microformats2.Parser
import Data.Default
import Network.URI
parseMf2 def $ documentRoot $ parseLBS "<body><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"
parseMf2 (def { baseUri = parseURI "https://where.i.got/that/page/from/" }) $ documentRoot $ parseLBS "<body><base href=\"base/\"><link rel=micropub href='micropub'><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"
The def
is the default configuration.
The configuration includes:
htmlMode
, an HTML parsing mode (Unsafe
| Escape
| Sanitize
)
baseUri
, the Maybe URI
that represents the address you retrieved the HTML from, used for resolving relative addresses -- you should set it
parseMf2
will return an Aeson Value structured like canonical microformats2 JSON.
lens-aeson is a good way to navigate it.
Development
Use stack to build.
Use ghci to run tests quickly with :test
(see the .ghci
file).
$ stack build
$ stack test
$ stack ghci
Contributing
Please feel free to submit pull requests!
By participating in this project you agree to follow the Contributor Code of Conduct and to release your contributions under the Unlicense.
The list of contributors is available on GitHub.
License
This is free and unencumbered software released into the public domain.
For more information, please refer to the UNLICENSE
file or unlicense.org.