∞ Image Documents Evaluation¶
Documentation¶
- Class name:
LLMMultiModalImageEvaluation
- Category:
SALT/Language Toolkit/Querying
- Output node:
False
This node is designed to evaluate images based on a given query using a specified LLM model. It processes image paths or documents, loads the images, and sends them to the LLM model along with the query for evaluation, providing a text-based assessment of the images.
Input types¶
Required¶
llm_model
- The LLM model used for evaluating the images. It is crucial for performing the evaluation as it processes the images and the query to generate the evaluation result.
- Comfy dtype:
LLM_MODEL
- Python dtype:
dict
image_documents
- A list of image documents to be evaluated. These documents are processed and evaluated by the LLM model in conjunction with the query.
- Comfy dtype:
DOCUMENT
- Python dtype:
list
llm_message
- The query or message to be evaluated by the LLM model. This message guides the evaluation process of the images.
- Comfy dtype:
LIST
- Python dtype:
list
Optional¶
max_tokens
- The maximum number of tokens to be generated by the LLM model during the evaluation. It limits the length of the evaluation result.
- Comfy dtype:
INT
- Python dtype:
int
Output types¶
response
- Comfy dtype:
STRING
- The evaluation result text from the LLM model, providing a text-based assessment of the images.
- Python dtype:
str
- Comfy dtype:
Usage tips¶
- Infra type:
GPU
- Common nodes: unknown
Source code¶
class LLMMultiModalImageEvaluation:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"llm_model": ("LLM_MODEL",),
"image_documents": ("DOCUMENT",),
"llm_message": ("LIST",),
},
"optional": {
"max_tokens": ("INT", {"min": 1, "max": 4096, "default": 1024})
}
}
RETURN_TYPES = ("STRING", )
RETURN_NAMES = ("response", )
FUNCTION = "complete"
CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Querying"
def complete(self, llm_model, image_documents, llm_message, max_tokens=1024):
if not max_tokens or not isinstance(max_tokens, int):
max_tokens = 1024
model = llm_model.get("llm", None)
if not model:
raise ValueError("LLMMultiModalImageEvaluation unable to detect valid model")
prompt = ""
llm_message = sorted(llm_message, key=lambda message: message.role.value)
for msg in llm_message:
if isinstance(msg, ChatMessage) and msg.role == MessageRole.SYSTEM:
if "SYSTEM:" not in prompt:
prompt += "SYSTEM: "
prompt += msg.content + "\n\n"
if isinstance(msg, ChatMessage) and msg.role == MessageRole.USER:
if "USER:" not in prompt:
prompt += "USER: "
prompt += msg.content + "\n\n"
if isinstance(msg, str):
prompt += msg + "\n\n"
response = model.complete(
prompt=prompt,
image_documents=image_documents,
max_tokens=max_tokens
)
return (response.text, )