∞ Vector Store Index¶
Documentation¶
- Class name:
LLMVectorStoreIndex
- Category:
SALT/Language Toolkit/Indexing
- Output node:
False
This node is designed to create and manage a vector store index for language models, facilitating efficient storage, retrieval, and manipulation of vectorized representations of text data. It abstracts the complexities involved in handling large-scale vector data, optimizing for performance and scalability.
Input types¶
Required¶
llm_model
- Specifies the language model to be used for generating vector representations. It is crucial for determining the embedding model that will vectorize the text data.
- Comfy dtype:
LLM_MODEL
- Python dtype:
dict
document
- The text document or a collection of documents to be indexed. This input is essential for generating the vector representations that will be stored in the index.
- Comfy dtype:
DOCUMENT
- Python dtype:
str or list of str
Optional¶
optional_llm_context
- An optional context parameter that can be used to provide additional information or settings to the language model during the indexing process.
- Comfy dtype:
LLM_CONTEXT
- Python dtype:
dict
Output types¶
llm_index
- Comfy dtype:
LLM_INDEX
- The generated vector store index, which can be used for subsequent retrieval and manipulation of the vectorized text data.
- Python dtype:
VectorStoreIndex
- Comfy dtype:
Usage tips¶
- Infra type:
CPU
- Common nodes: unknown
Source code¶
class LLMVectorStoreIndex:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"llm_model": ("LLM_MODEL",),
"document": ("DOCUMENT",),
},
"optional": {
"optional_llm_context": ("LLM_CONTEXT",),
},
}
RETURN_TYPES = ("LLM_INDEX",)
RETURN_NAMES = ("llm_index",)
FUNCTION = "index"
CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Indexing"
def index(self, llm_model, document, optional_llm_context = None):
#document = cast(Sequence[Document], document) # This could be why documents are not working correctly
embed_model = llm_model.get("embed_model", None)
if not embed_model:
raise ValueError("Unable to determine LLM Embedding Model")
splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=0)
tokenizer = MockTokenizer(max_tokens=1024, char_per_token=1)
#documents = []
#for doc in document:
# logger.info("Document:")
# logger.data(doc)
# logger.info("\n==================\n")
# metadata = {}
# text = doc.text
# if doc.metadata:
# metadata = doc.metadata
# token_count = tokenizer.count(metadata)
# if token_count > 1024:
# metadata = tokenizer.truncate(metadata)
# documents.append(Document(text=text, extra_info=metadata))
index = VectorStoreIndex.from_documents(
document,
embed_model=embed_model,
service_context=optional_llm_context,
transformations=[splitter]
)
return (index,)