scrappy-core-0.1.0.1: html pattern matching library and high-level interface concurrent requests lib for webscraping
Safe HaskellSafe-Inferred
LanguageHaskell2010

Scrappy.Find

Synopsis

Documentation

findNaive :: forall s (m :: Type -> Type) u a. Stream s m Char => ParsecT s u m a -> ParsecT s u m (Maybe [a]) Source #

This module provides an interface for getting patterns seperated by whatever in a given source | that you plan to parse

findSequential(_x) is for information rich elements such as products that should have multiple fields | that the user would like to return

Converts a parsing/scraping pattern to one which either returns Nothing | or Just a list of at least 1 element. Maybe type is used so that there is a clearer | distinction between a failed search and a successful one

findNaiveIO :: forall (m :: Type -> Type) s a u. (MonadIO m, Stream s m Char, Show a) => ParsecT s u m a -> ParsecT s u m (Maybe [a]) Source #

findIO :: forall (m :: Type -> Type) s a u. (MonadIO m, Stream s m Char, Show a) => ParsecT s u m a -> ParsecT s u m [Either ScrapeFail a] Source #

Great for debugging

findSequential :: forall s (m :: Type -> Type) u a. Stream s m Char => [ParsecT s u m a] -> ParsecT s u m [Either ScrapeFail a] Source #

findSequential2 :: forall s (m :: Type -> Type) u a b. Stream s m Char => (ParsecT s u m a, ParsecT s u m b) -> ParsecT s u m (a, b) Source #

findSequential3 :: forall s (m :: Type -> Type) u a b c. Stream s m Char => (ParsecT s u m a, ParsecT s u m b, ParsecT s u m c) -> ParsecT s u m (a, b, c) Source #

findUntilMatch :: forall s (m :: Type -> Type) u a. Stream s m Char => ParsecT s u m a -> ParsecT s u m a Source #

Like find naive except that finishes parsing on the first match it finds in the document

find :: forall s (m :: Type -> Type) u a. Stream s m Char => ParsecT s u m a -> ParsecT s u m [Either ScrapeFail a] Source #

streamEdit :: ParsecT String () Identity a -> (a -> String) -> String -> String Source #

Should never throw Left or I did it wrong

findEdit :: forall (m :: Type -> Type) a u. Stream String m Char => (a -> String) -> ParsecT String u m a -> ParsecT String u m String Source #

editFirst :: forall (m :: Type -> Type) a u. Stream String m Char => (a -> String) -> ParsecT String u m a -> ParsecT String u m String Source #

data StreamEditCase Source #

We can define Edit to be a string because we know it will turn back into one

Constructors

EOF 
Carry Char 
Edit String 

baseParser :: forall s (m :: Type -> Type) u a. Stream s m Char => ParsecT s u m a -> ParsecT s u m (Either ScrapeFail a) Source #

givesNothing :: forall s (m :: Type -> Type) u a. Stream s m Char => ParsecT s u m (Either ScrapeFail a) Source #

endStream :: forall s (m :: Type -> Type) t u a. (Stream s m t, Show t) => ParsecT s u m (Either ScrapeFail a) Source #

findSomeHTMLNaive :: Stream s Identity Char => Parsec s () a -> s -> Maybe [a] Source #

Just since do we really care about non matches?

findAllBetween :: a Source #

My findAll' function design / runParserOnHtml use Maybe instead of Either to toss failure case [] -> Nothing

so it returns :: Maybe [a] = Just [a] | Nothing which will be beautiful for modeling at high level from scrape result to scrape result

I also really need to implement non-zero, non-ending predicate inner function | like nonZeroSep https://hackage.haskell.org/package/replace-megaparsec-1.4.4.0/docs/src/Replace.Megaparsec.html#sepCap

NOTE: I can replace manyTill_ with anyTill from Replace.Megaparsec

buildSequentialElemsParser :: forall s u (m :: Type -> Type) a. ParsecT s u m [a] Source #

Use with constructed for parsing datatype