🔧 SDXL CLIPTextEncode¶

Documentation¶

Class name: CLIPTextEncodeSDXL+
Category: essentials/conditioning
Output node: False

This node is designed to encode text inputs using a CLIP model, specifically tailored for the Stable Diffusion XL framework. It processes text inputs to generate conditioning vectors and pooled outputs that are optimized for image generation tasks, incorporating additional parameters such as aesthetic scores and dimensions to fine-tune the generated images.

Input types¶

Required¶

width
- Defines the width of the target image in pixels, affecting the aspect ratio and resolution of the generated image.
- Comfy dtype: INT
- Python dtype: int
height
- Sets the height of the target image in pixels, impacting the aspect ratio and resolution of the generated image.
- Comfy dtype: INT
- Python dtype: int
size_cond_factor
- Specifies the factor by which the dimensions of the target image are scaled, affecting the detail and scale of the generated image.
- Comfy dtype: INT
- Python dtype: int
text
- The text input to be encoded, serving as the basis for generating the conditioning vector and influencing the content of the generated image.
- Comfy dtype: STRING
- Python dtype: str
clip
- The CLIP model used for text tokenization and encoding, central to generating the conditioning vectors.
- Comfy dtype: CLIP
- Python dtype: torch.nn.Module

Output types¶

conditioning
- Comfy dtype: CONDITIONING
- Outputs a conditioning vector and associated metadata, including the aesthetic score and image dimensions, tailored for image generation.
- Python dtype: list

Usage tips¶

Infra type: GPU
Common nodes: unknown

Source code¶

class CLIPTextEncodeSDXLSimplified:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "width": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "height": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
            "size_cond_factor": ("INT", {"default": 4, "min": 1, "max": 16 }),
            "text": ("STRING", {"multiline": True, "dynamicPrompts": True, "default": ""}),
            "clip": ("CLIP", ),
            }}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "execute"
    CATEGORY = "essentials/conditioning"

    def execute(self, clip, width, height, size_cond_factor, text):
        crop_w = 0
        crop_h = 0
        width = width*size_cond_factor
        height = height*size_cond_factor
        target_width = width
        target_height = height
        text_g = text_l = text

        tokens = clip.tokenize(text_g)
        tokens["l"] = clip.tokenize(text_l)["l"]
        if len(tokens["l"]) != len(tokens["g"]):
            empty = clip.tokenize("")
            while len(tokens["l"]) < len(tokens["g"]):
                tokens["l"] += empty["l"]
            while len(tokens["l"]) > len(tokens["g"]):
                tokens["g"] += empty["g"]
        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
        return ([[cond, {"pooled_output": pooled, "width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}]], )