Skip to content

∞ Vector Store Index

Documentation

  • Class name: LLMVectorStoreIndex
  • Category: SALT/Language Toolkit/Indexing
  • Output node: False

This node is designed to create and manage a vector store index for language models, facilitating efficient storage, retrieval, and manipulation of vectorized representations of text data. It abstracts the complexities involved in handling large-scale vector data, optimizing for performance and scalability.

Input types

Required

  • llm_model
    • Specifies the language model to be used for generating vector representations. It is crucial for determining the embedding model that will vectorize the text data.
    • Comfy dtype: LLM_MODEL
    • Python dtype: dict
  • document
    • The text document or a collection of documents to be indexed. This input is essential for generating the vector representations that will be stored in the index.
    • Comfy dtype: DOCUMENT
    • Python dtype: str or list of str

Optional

  • optional_llm_context
    • An optional context parameter that can be used to provide additional information or settings to the language model during the indexing process.
    • Comfy dtype: LLM_CONTEXT
    • Python dtype: dict

Output types

  • llm_index
    • Comfy dtype: LLM_INDEX
    • The generated vector store index, which can be used for subsequent retrieval and manipulation of the vectorized text data.
    • Python dtype: VectorStoreIndex

Usage tips

  • Infra type: CPU
  • Common nodes: unknown

Source code

class LLMVectorStoreIndex:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "llm_model": ("LLM_MODEL",),
                "document": ("DOCUMENT",),
            },
            "optional": {
                "optional_llm_context": ("LLM_CONTEXT",),
            },
        }

    RETURN_TYPES = ("LLM_INDEX",)
    RETURN_NAMES = ("llm_index",)

    FUNCTION = "index"
    CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Indexing"

    def index(self, llm_model, document, optional_llm_context = None):

        #document = cast(Sequence[Document], document) # This could be why documents are not working correctly
        embed_model = llm_model.get("embed_model", None)

        if not embed_model:
            raise ValueError("Unable to determine LLM Embedding Model")

        splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=0)
        tokenizer = MockTokenizer(max_tokens=1024, char_per_token=1)

        #documents = []
        #for doc in document:
        #    logger.info("Document:")
        #    logger.data(doc)
        #    logger.info("\n==================\n")
        #    metadata = {}
        #    text = doc.text
        #    if doc.metadata:
        #        metadata = doc.metadata
        #        token_count = tokenizer.count(metadata)
        #        if token_count > 1024:
        #            metadata = tokenizer.truncate(metadata)
        #    documents.append(Document(text=text, extra_info=metadata))

        index = VectorStoreIndex.from_documents(
            document, 
            embed_model=embed_model,
            service_context=optional_llm_context,
            transformations=[splitter]
        )

        return (index,)