Chunking API¶
LangChainChunker ¶
LangChainChunker(
method: Literal["recursive", "character", "token"] = "recursive",
chunk_size: int = 2048,
chunk_overlap: int = 256,
**kwargs: Any
)
Bases: BaseChunker[Document]
Wrapper for LangChain TextSplitter.
Parameters:
-
method(Literal['recursive', 'character', 'token'], default:"recursive") –Describes the type of TextSplitter as the main instance performing the chunking.
-
chunk_size(int, default:2048) –Maximum size of a single chunk that is returned.
-
chunk_overlap(int, default:256) –Overlap in characters between chunks.
Other Parameters:
-
separators(list[str]) –Separators between chunks.
Source code in ai4rag/rag/chunking/langchain_chunker.py
Functions¶
to_dict ¶
Return dictionary that can be used to recreate an instance of the LangChainChunker.
Source code in ai4rag/rag/chunking/langchain_chunker.py
from_dict classmethod ¶
split_documents ¶
Split series of documents into smaller chunks based on the provided chunker settings. Each chunk has metadata that includes the document_id, sequence_number, and start_index.
Parameters:
-
documents(Sequence[Document]) –Sequence of elements that contain context in a text format.
Returns:
-
list[Document]–List of documents split into smaller chunks.