🔧 SDXL CLIPTextEncode¶
Documentation¶
- Class name:
CLIPTextEncodeSDXL+
- Category:
essentials/conditioning
- Output node:
False
This node is designed to encode text inputs using a CLIP model, specifically tailored for the Stable Diffusion XL framework. It processes text inputs to generate conditioning vectors and pooled outputs that are optimized for image generation tasks, incorporating additional parameters such as aesthetic scores and dimensions to fine-tune the generated images.
Input types¶
Required¶
width
- Defines the width of the target image in pixels, affecting the aspect ratio and resolution of the generated image.
- Comfy dtype:
INT
- Python dtype:
int
height
- Sets the height of the target image in pixels, impacting the aspect ratio and resolution of the generated image.
- Comfy dtype:
INT
- Python dtype:
int
size_cond_factor
- Specifies the factor by which the dimensions of the target image are scaled, affecting the detail and scale of the generated image.
- Comfy dtype:
INT
- Python dtype:
int
text
- The text input to be encoded, serving as the basis for generating the conditioning vector and influencing the content of the generated image.
- Comfy dtype:
STRING
- Python dtype:
str
clip
- The CLIP model used for text tokenization and encoding, central to generating the conditioning vectors.
- Comfy dtype:
CLIP
- Python dtype:
torch.nn.Module
Output types¶
conditioning
- Comfy dtype:
CONDITIONING
- Outputs a conditioning vector and associated metadata, including the aesthetic score and image dimensions, tailored for image generation.
- Python dtype:
list
- Comfy dtype:
Usage tips¶
- Infra type:
GPU
- Common nodes: unknown
Source code¶
class CLIPTextEncodeSDXLSimplified:
@classmethod
def INPUT_TYPES(s):
return {"required": {
"width": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
"height": ("INT", {"default": 1024.0, "min": 0, "max": MAX_RESOLUTION}),
"size_cond_factor": ("INT", {"default": 4, "min": 1, "max": 16 }),
"text": ("STRING", {"multiline": True, "dynamicPrompts": True, "default": ""}),
"clip": ("CLIP", ),
}}
RETURN_TYPES = ("CONDITIONING",)
FUNCTION = "execute"
CATEGORY = "essentials/conditioning"
def execute(self, clip, width, height, size_cond_factor, text):
crop_w = 0
crop_h = 0
width = width*size_cond_factor
height = height*size_cond_factor
target_width = width
target_height = height
text_g = text_l = text
tokens = clip.tokenize(text_g)
tokens["l"] = clip.tokenize(text_l)["l"]
if len(tokens["l"]) != len(tokens["g"]):
empty = clip.tokenize("")
while len(tokens["l"]) < len(tokens["g"]):
tokens["l"] += empty["l"]
while len(tokens["l"]) > len(tokens["g"]):
tokens["g"] += empty["g"]
cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
return ([[cond, {"pooled_output": pooled, "width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}]], )