∞ Summary Index¶
Documentation¶
- Class name:
LLMSummaryIndex
- Category:
SALT/Language Toolkit/Indexing
- Output node:
False
The LLMSummaryIndex node is designed to create a summary index from a collection of documents using a specified language model embedding. It processes each document to handle metadata and text, applying sentence splitting as part of its transformation steps, and then constructs an index that can be used for summarizing or retrieving information from the documents.
Input types¶
Required¶
llm_model
- The language model and its embedding model used for generating document summaries. It's crucial for determining the embedding strategy for the documents.
- Comfy dtype:
LLM_MODEL
- Python dtype:
Dict[str, Any]
document
- A list of documents to be indexed. Each document's text and metadata are processed for indexing.
- Comfy dtype:
DOCUMENT
- Python dtype:
List[Document]
Optional¶
optional_llm_context
- Optional context provided to the language model during the indexing process. It can be used to tailor the embedding process to specific requirements or contexts.
- Comfy dtype:
LLM_CONTEXT
- Python dtype:
Optional[Dict[str, Any]]
Output types¶
llm_index
- Comfy dtype:
LLM_INDEX
- The summary index created from the documents, ready for use in summarization or information retrieval tasks.
- Python dtype:
SummaryIndex
- Comfy dtype:
Usage tips¶
- Infra type:
CPU
- Common nodes: unknown
Source code¶
class LLMSummaryIndex:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"llm_model": ("LLM_MODEL",),
"document": ("DOCUMENT",),
},
"optional": {
"optional_llm_context": ("LLM_CONTEXT",),
},
}
RETURN_TYPES = ("LLM_INDEX",)
RETURN_NAMES = ("llm_index",)
FUNCTION = "index"
CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Indexing"
def index(self, llm_model, document, optional_llm_context=None):
embed_model = llm_model.get("embed_model", None)
if not embed_model:
raise ValueError("Unable to determine LLM Embedding Model")
splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=0)
tokenizer = MockTokenizer(max_tokens=1024, char_per_token=1)
documents = []
for doc in document:
logger.info("Document:")
logger.data(doc)
logger.info("\n==================\n")
metadata = {}
text = doc.text
if doc.metadata:
metadata = doc.metadata
token_count = tokenizer.count(metadata)
if token_count > 1024:
metadata = tokenizer.truncate(metadata)
documents.append(Document(text=text, extra_info=metadata))
index = SummaryIndex.from_documents(
documents,
embed_model=embed_model,
service_context=optional_llm_context or None,
transformations=[splitter]
)
return (index,)