Skip to content

∞ Setence Splitter

Documentation

  • Class name: LLMSentenceSplitter
  • Category: SALT/Language Toolkit/Parsing
  • Output node: False

The LLMSentenceSplitter node is designed to segment text into smaller, manageable chunks based on specified size and overlap criteria. This functionality is crucial for processing large documents or texts in a way that makes them more amenable to detailed analysis or further processing steps.

Input types

Required

  • chunk_size
    • Specifies the maximum size of each text chunk. It determines how large each segment of text will be, directly impacting the granularity of the splitting process.
    • Comfy dtype: INT
    • Python dtype: int
  • chunk_overlap
    • Defines the number of characters that will overlap between consecutive text chunks. This overlap ensures continuity and context preservation across the segmented text.
    • Comfy dtype: INT
    • Python dtype: int

Output types

  • llm_sentence_splitter
    • Comfy dtype: LLM_SENTENCE_SPLITTER
    • Produces an instance of a SentenceSplitter, configured with the provided chunk size and overlap, ready to be used for text segmentation.
    • Python dtype: SentenceSplitter

Usage tips

  • Infra type: CPU
  • Common nodes: unknown

Source code

class LLMSentenceSplitter:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "chunk_size": ("INT", {"min": 8, "max": 2048, "step": 1, "default": 1024}),
                "chunk_overlap": ("INT", {"min": 0, "max": 2048, "step": 1, "default": 20})
            },
        }

    RETURN_TYPES = ("LLM_SENTENCE_SPLITTER",)
    RETURN_NAMES = ("llm_sentence_splitter",)

    FUNCTION = "semantic_nodes"
    CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Parsing"

    def semantic_nodes(self, chunk_size, chunk_overlap):
        splitter = SentenceSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
        )
        return (splitter, )