Skip to content

∞ Setence Splitter Node Creator

Documentation

  • Class name: LLMSentenceSplitterNodeCreator
  • Category: SALT/Language Toolkit/Tools
  • Output node: False

This node is designed to split a given document into smaller, manageable nodes based on sentence boundaries. It utilizes customizable chunk sizes and overlaps to ensure comprehensive coverage and continuity across the document, facilitating further processing or analysis.

Input types

Required

  • document
    • The primary text document to be split into smaller nodes. It serves as the foundational input for the node's operation, dictating the scope and granularity of the splitting process.
    • Comfy dtype: DOCUMENT
    • Python dtype: str

Optional

  • chunk_size
    • Specifies the maximum size of each chunk or node created from the document, allowing for control over the granularity of the splitting process.
    • Comfy dtype: INT
    • Python dtype: int
  • chunk_overlap
    • Determines the amount of overlap between consecutive chunks or nodes, ensuring continuity and context preservation across splits.
    • Comfy dtype: INT
    • Python dtype: int

Output types

  • llm_nodes
    • Comfy dtype: LLM_NODES
    • The output is a collection of smaller, sentence-based nodes derived from the original document, ready for further processing or analysis.
    • Python dtype: tuple

Usage tips

  • Infra type: CPU
  • Common nodes: unknown

Source code

class LLMSentenceSplitterNodeCreator:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "document": ("DOCUMENT",),
            },
            "optional": {
                "chunk_size": ("INT", {"default": 1024, "min": 1}),
                "chunk_overlap": ("INT", {"default": 20, "min": 0}),
            },
        }

    RETURN_TYPES = ("LLM_NODES",)
    RETURN_NAMES = ("llm_nodes",)

    FUNCTION = "create_nodes"
    CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Tools"

    def create_nodes(self, document, chunk_size=1024, chunk_overlap=20):
        node_parser = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
        nodes = node_parser.get_nodes_from_documents(document, show_progress=False)        
        return (nodes,)