Skip to content

∞ Image Documents Evaluation as Tool

Documentation

  • Class name: LLMMultiModalImageEvaluationTool
  • Category: SALT/Language Toolkit/Agents/Tools
  • Output node: False

This node specializes in evaluating images based on specific queries using a language model. It processes image paths or documents, loads the images, and sends them to the language model along with the query for comprehensive evaluation. The node is designed to integrate multimodal inputs (text and images) for generating detailed assessments or responses, making it a critical component for applications requiring nuanced understanding and interaction with visual content.

Input types

Required

  • llm_model
    • The language model used for evaluating the images in conjunction with the provided query. It's essential for interpreting the visual content and generating relevant responses.
    • Comfy dtype: LLM_MODEL
    • Python dtype: object
  • name
    • A unique identifier for the evaluation tool. This parameter is crucial for distinguishing between different instances or uses of the evaluation tool.
    • Comfy dtype: STRING
    • Python dtype: str
  • description
    • A detailed description of the evaluation tool's purpose and functionality. This parameter helps to provide context and expectations for the tool's output.
    • Comfy dtype: STRING
    • Python dtype: str

Optional

  • image_documents
    • A list of image documents to be evaluated. This parameter enables the node to directly process pre-loaded images for evaluation, facilitating more flexible and efficient handling of visual content.
    • Comfy dtype: DOCUMENT
    • Python dtype: list
  • max_tokens
    • The maximum number of tokens to be generated by the language model in the response. It controls the verbosity and detail level of the evaluation output.
    • Comfy dtype: INT
    • Python dtype: int

Output types

  • evaluator_tool
    • Comfy dtype: TOOL
    • The output from the node providing an evaluation or response based on the input query and images. It represents the node's capability to assess and interpret visual content through the language model.
    • Python dtype: str

Usage tips

  • Infra type: GPU
  • Common nodes: unknown

Source code

class LLMMultiModalImageEvaluationTool:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "llm_model": ("LLM_MODEL",),
                "name": ("STRING", {"multiline": False, "dynamicPrompts": False, "placeholder": "evaluator"}),
                "description": ("STRING", {"multiline": True, "dynamicPrompts": False, "default": "A function that allows you to evaluate an image. Ask a question and this function evaluate an image for the answer, be sure to describe the desired output format."}),
            },
            "optional": {
                "image_documents": ("DOCUMENT",),
                "max_tokens": ("INT", {"min": 1, "max": 4096, "default": 1024})
            }
        }

    RETURN_TYPES = ("TOOL",)
    RETURN_NAMES = ("evaluator_tool",)

    FUNCTION = "return_tool"
    CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Agents/Tools"

    def return_tool(self, llm_model, name, description, image_documents = None, max_tokens=1024):
        def evaluator_tool(query: str, image_paths: str = None) -> str:
            """
            Evaluates images based on a query using the provided LLM model.

            Parameters:
            query (str): The query or question to be evaluated by the LLM model.
            image_paths (str, optional): A comma-separated string of image file paths to be evaluated.

            Returns:
            str: The evaluation result text from the LLM model.

            The function processes the image paths or documents, loads the images,
            and sends them to the LLM model along with the query for evaluation.
            """

            if image_paths:
                logger.info(f"Loading images from paths: {image_paths}")

            image_paths = [path.strip() for path in image_paths.split(",") if path]

            if not image_documents and not image_paths:
                raise ValueError("No image paths or image documents were provided! Please provide at least `image_documents` or `image_paths`")

            model = llm_model.get("llm", None)

            if not model:
                raise ValueError("LLMMultiModalImageEvaluationTool unable to detect valid model")

            docs = []
            if image_documents and isinstance(image_documents, list):
                docs.extend(image_documents)
            if image_paths and isinstance(image_paths, list):
                temp_dir = str(uuid.uuid4())
                temp_path = get_full_path(0, temp_dir)

                os.makedirs(temp_path, exist_ok=True)

                for path in image_paths:
                    if os.path.exists(path):
                        logger.info(f"Moving temporary file from {path} to {temp_path}")
                        shutil.copy(path, temp_path)

                reader = SimpleDirectoryReader(
                    input_dir=temp_path,
                    exclude_hidden=True,
                    recursive=False
                )

                img_docs = reader.load_data()
                if img_docs:
                    docs.extend(img_docs)

            response = model.complete(
                prompt=query,
                image_documents=docs,
                max_tokens=max_tokens
            )

            return (response.text, )

        evaluator = {"name": name, "description": description, "function": evaluator_tool}

        return (evaluator,)