| Copyright | (c) 2025 Tushar Adhatrao |
|---|---|
| License | MIT |
| Maintainer | Tushar Adhatrao <tusharadhatrao@gmail.com> |
| Stability | experimental |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Langchain.DocumentLoader.Core
Description
Implementation of LangChain's document loading abstraction, providing:
- Document representation with content and metadata
- Typeclass for loading/splitting documents from various sources
- Integration with text splitting capabilities
For more information on document loader in the original Langchain library, see: https:/python.langchain.comdocsconceptsdocument_loaders/
Example usage:
-- Create a document
doc :: Document
doc = Document "Sample content" (fromList [("source", String "example.txt")])
-- Hypothetical file loader instance
data FileLoader = FileLoader FilePath
instance BaseLoader FileLoader where
load (FileLoader path) = do
content <- readFile path
return $ Right [Document content (fromList [("source", String (T.pack path))])]
Test case patterns:
>>>mempty :: DocumentDocument {pageContent = "", metadata = fromList []}
>>>doc1 = Document "Hello" (fromList [("a", Number 1)])>>>doc2 = Document " World" (fromList [("b", Bool True)])>>>doc1 <> doc2Document {pageContent = "Hello World", metadata = fromList [("a", Number 1), ("b", Bool True)]}
Document Representation
Document container with content and metadata. Used for storing loaded data and associated metadata like source URLs or page numbers.
Example:
>>>Document "Hello World" (fromList [("source", String "example.txt")])Document {pageContent = "Hello World", metadata = fromList [("source",String "example.txt")]}
Constructors
| Document | |
Instances
| Monoid Document Source # | Monoid instance provides empty document:
|
| Semigroup Document Source # | Semigroup instance combines both content and metadata
|
| Show Document Source # | |
| Eq Document Source # | |
Loading Interface
class BaseLoader m where Source #
Typeclass for document loading implementations. Implementations should define how to:
- Load full documents with
load - Load and split content with
loadAndSplit
Example instance for text files:
instance BaseLoader FilePath where
load path = do
content <- readFile path
return $ Right [Document content (fromList [("source", String (T.pack path))])]
loadAndSplit path = do
content <- readFile path
return $ Right (splitText defaultCharacterSplitterOps content)
Methods
load :: m -> IO (Either String [Document]) Source #
Load all documents from the source.
loadAndSplit :: m -> IO (Either String [Text]) Source #
Load all the document and split them using recursiveCharacterSpliter