∞ Vector Store Index (Adv)¶
Documentation¶
- Class name:
LLMVectorStoreIndexAdv
- Category:
SALT/Language Toolkit/Indexing
- Output node:
False
The LLMVectorStoreIndexAdv node is designed for advanced vector storage indexing within a language model framework. It extends the capabilities of standard vector indexing by incorporating additional features or optimizations tailored for complex language processing tasks. This node is pivotal for enhancing the efficiency and accuracy of retrieving and managing vectorized representations of text data in large-scale language models.
Input types¶
Required¶
llm_model
- Specifies the language model to be used for indexing, serving as the core component for generating vector representations of text.
- Comfy dtype:
LLM_MODEL
- Python dtype:
str
document
- The text document to be indexed, which is processed to generate its vector representation.
- Comfy dtype:
DOCUMENT
- Python dtype:
str
Optional¶
chunk_size
- Defines the size of text chunks for processing, allowing for customization of indexing granularity.
- Comfy dtype:
COMBO[INT]
- Python dtype:
List[int]
chunk_overlap
- Determines the overlap between consecutive text chunks to ensure continuity in the indexing process.
- Comfy dtype:
INT
- Python dtype:
int
optional_llm_context
- Provides additional context from a language model to enhance the indexing accuracy.
- Comfy dtype:
LLM_CONTEXT
- Python dtype:
str
Output types¶
llm_index
- Comfy dtype:
LLM_INDEX
- The result of the indexing process, representing the vectorized form of the input document.
- Python dtype:
str
- Comfy dtype:
Usage tips¶
- Infra type:
CPU
- Common nodes: unknown
Source code¶
class LLMVectorStoreIndexAdv:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"llm_model": ("LLM_MODEL",),
"document": ("DOCUMENT",),
},
"optional": {
"chunk_size": ([128, 256, 512, 1024, 2048],),
"chunk_overlap": ("INT", {"min": 0, "max": 512, "default": 0}),
"optional_llm_context": ("LLM_CONTEXT",)
},
}
RETURN_TYPES = ("LLM_INDEX",)
RETURN_NAMES = ("llm_index",)
FUNCTION = "index"
CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Indexing"
def index(self, llm_model, document, chunk_size=1024, chunk_overlap=0, optional_llm_context=None):
embed_model = llm_model.get("embed_model", None)
if not embed_model:
raise ValueError("Unable to determine LLM Embedding Model")
splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
tokenizer = MockTokenizer(max_tokens=chunk_size, char_per_token=1)
#documents = []
#for doc in document:
# logger.info("Document:")
# logger.data(doc)
# logger.info("\n==================\n")
# metadata = {}
# text = doc.text
# if doc.metadata:
# metadata = doc.metadata
# token_count = tokenizer.count(metadata)
# if token_count > 1024:
# metadata = tokenizer.truncate(metadata)
# documents.append(Document(text=text, extra_info=metadata))
index = VectorStoreIndex.from_documents(
document,
embed_model=embed_model,
service_context=optional_llm_context,
transformations=[splitter]
)
return (index,)