Skip to content

CLIPTextEncodeSDXLRefiner

Documentation

  • Class name: CLIPTextEncodeSDXLRefiner
  • Category: advanced/conditioning
  • Output node: False

The CLIPTextEncodeSDXLRefiner node specializes in refining text inputs for image generation by encoding them using a CLIP model, incorporating aesthetic scores and dimensions to enhance the conditioning for image synthesis.

Input types

Required

  • ascore
    • The aesthetic score parameter influences the aesthetic quality of the generated image, allowing for fine-tuning based on aesthetic preferences.
    • Comfy dtype: FLOAT
    • Python dtype: float
  • width
    • Specifies the desired width of the output image, impacting the aspect ratio and detail level of the generated image.
    • Comfy dtype: INT
    • Python dtype: int
  • height
    • Determines the height of the output image, affecting its aspect ratio and the level of detail in the vertical dimension.
    • Comfy dtype: INT
    • Python dtype: int
  • text
    • The text input to be encoded, serving as the primary source of semantic information for image generation.
    • Comfy dtype: STRING
    • Python dtype: str
  • clip
    • A CLIP model instance used for encoding the text input, crucial for interpreting the semantic content.
    • Comfy dtype: CLIP
    • Python dtype: torch.nn.Module

Output types

  • conditioning
    • Comfy dtype: CONDITIONING
    • Provides the encoded text along with aesthetic scores and dimensions, ready for use in further image synthesis processes.
    • Python dtype: list

Usage tips

Source code

class CLIPTextEncodeSDXLRefiner:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "ascore": ("FLOAT", {"default": 6.0, "min": 0.0, "max": 1000.0, "step": 0.01}),
            "width": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "height": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "text": ("STRING", {"multiline": True, "dynamicPrompts": True}), "clip": ("CLIP", ),
            }}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"

    CATEGORY = "advanced/conditioning"

    def encode(self, clip, ascore, width, height, text):
        tokens = clip.tokenize(text)
        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
        return ([[cond, {"pooled_output": pooled, "aesthetic_score": ascore, "width": width,"height": height}]], )