langchain-hs-0.0.1.0: Haskell implementation of Langchain
Copyright(c) 2025 Tushar Adhatrao
LicenseMIT
MaintainerTushar Adhatrao <tusharadhatrao@gmail.com>
Stabilityexperimental
Safe HaskellSafe-Inferred
LanguageHaskell2010

Langchain.TextSplitter.Character

Description

Character-based text splitting implementation following LangChain's text splitter concepts. Splits text into chunks based on separators and maximum chunk sizes, useful for processing large documents with LLMs.

For more information on text splitting concepts, see the Langchain documentation: Langchain TextSplitter.

Example usage:

-- Split text using default settings (100 char chunks, double newline separator)
splitText defaultCharacterSplitterOps "Long document text..."

-- Custom configuration for 500-char chunks with paragraph splitting
customSplit = splitText (CharacterSplitterOps 500 "n\s*n")
Synopsis

Configuration

data CharacterSplitterOps Source #

Configuration for character-based text splitting Contains:

Default values follow LangChain's recommended settings for LLM input preparation.

Constructors

CharacterSplitterOps 

Fields

defaultCharacterSplitterOps :: CharacterSplitterOps Source #

Default splitter configuration

  • 100 character chunks
  • Splits on double newlines ("nn")
>>> defaultCharacterSplitterOps
CharacterSplitterOps {chunkSize = 100, separator = "\n\n"}

Splitting Function

splitText :: CharacterSplitterOps -> Text -> [Text] Source #

Split text into chunks following LangChain's splitting strategy: - 1. Split by separator first 2. Chunk each segment into specified size 3. Preserve semantic boundaries where possible

Examples: >>> splitText defaultCharacterSplitterOps "" []

>>> splitText defaultCharacterSplitterOps "Short text"
["Short text"]
>>> splitText defaultCharacterSplitterOps "Part1\n\nPart2\n\nPart3"
["Part1", "Part2", "Part3"]
>>> splitText (CharacterSplitterOps 20 "\n\n") "Very long text exceeding chunk size..."
["Very long text ex", "ceeding chunk size..."]