Skip to content

text_splitter

Package for classes about text splitter.

Modules:

  • base

    The base classes for text splitter.

  • langchain

    Support langchain text splitter.

Classes:

LangChainTextSplitter

LangChainTextSplitter(
    splitter_name: str = '',
    chunk_size: int = 500,
    chunk_overlap: int = 100,
)

Bases: TextSplitterBase

LangChain Text Splitter class.

Methods:

  • split

    Split text into smaller chunks for processing.

Source code in src/rago/retrieval/text_splitter/base.py
24
25
26
27
28
29
30
31
32
33
34
35
36
def __init__(
    self,
    splitter_name: str = '',
    chunk_size: int = 500,
    chunk_overlap: int = 100,
) -> None:
    """Initialize the text splitter class."""
    self.chunk_size = chunk_size or self.default_chunk_size
    self.chunk_overlap = chunk_overlap or self.default_chunk_overlap
    self.splitter_name = splitter_name or self.default_splitter_name

    self._validate()
    self._setup()

split

split(text: str) -> list[str]

Split text into smaller chunks for processing.

Source code in src/rago/retrieval/text_splitter/langchain.py
32
33
34
35
36
37
38
39
40
def split(self, text: str) -> list[str]:
    """Split text into smaller chunks for processing."""
    text_splitter = self.splitter(
        chunk_size=self.chunk_size,
        chunk_overlap=self.chunk_overlap,
        length_function=len,
        is_separator_regex=True,
    )
    return cast(List[str], text_splitter.split_text(text))

TextSplitterBase

TextSplitterBase(
    splitter_name: str = '',
    chunk_size: int = 500,
    chunk_overlap: int = 100,
)

The base text splitter class.

Methods:

  • split

    Split a text into chunks.

Source code in src/rago/retrieval/text_splitter/base.py
24
25
26
27
28
29
30
31
32
33
34
35
36
def __init__(
    self,
    splitter_name: str = '',
    chunk_size: int = 500,
    chunk_overlap: int = 100,
) -> None:
    """Initialize the text splitter class."""
    self.chunk_size = chunk_size or self.default_chunk_size
    self.chunk_overlap = chunk_overlap or self.default_chunk_overlap
    self.splitter_name = splitter_name or self.default_splitter_name

    self._validate()
    self._setup()

split abstractmethod

split(text: str) -> Iterable[str]

Split a text into chunks.

Source code in src/rago/retrieval/text_splitter/base.py
46
47
48
49
@abstractmethod
def split(self, text: str) -> Iterable[str]:
    """Split a text into chunks."""
    return []