LayerUtility: QWenImage2Prompt¶

Documentation¶

Class name: LayerUtility: QWenImage2Prompt
Category: 😺dzNodes/LayerUtility/Prompt
Output node: False

This node integrates a chat model to generate text prompts based on an input image and a question. It utilizes an image-to-text conversion process to facilitate interactions with the chat model, aiming to produce descriptive or query-responsive text outputs.

Input types¶

Required¶

image
- The input image for which the text prompt will be generated. It serves as the visual context for the chat model to interpret and describe.
- Comfy dtype: IMAGE
- Python dtype: torch.Tensor
question
- An optional question to guide the generation of the text prompt, allowing for more specific or directed outputs from the chat model.
- Comfy dtype: STRING
- Python dtype: str

Output types¶

text
- Comfy dtype: STRING
- The generated text prompt that describes or answers the question about the input image.
- Python dtype: str

Usage tips¶

Infra type: GPU
Common nodes: unknown

Source code¶

class QWenImage2Prompt:
    def __init__(self):
        self.chat_model = UformGen2QwenChat()

    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "image": ("IMAGE",),
                "question": ("STRING", {"multiline": False, "default": "describe this image",},),
            },
        }

    RETURN_TYPES = ("STRING",)
    RETURN_NAMES = ("text",)
    FUNCTION = "uform_gen2_qwen_chat"
    CATEGORY = '😺dzNodes/LayerUtility/Prompt'

    def uform_gen2_qwen_chat(self, image, question):
        history = []  # Example empty history
        pil_image = ToPILImage()(image[0].permute(2, 0, 1))
        temp_path = files_for_uform_gen2_qwen / "temp.png"
        pil_image.save(temp_path)

        response = self.chat_model.chat_response(question, history, temp_path)
        return (response.split("assistant\n", 1)[1], )