CLIPTextEncode SDXL Plus (JPS)¶

Documentation¶

Class name: CLIPTextEncode SDXL Plus (JPS)
Category: JPS Nodes/Conditioning
Output node: False

This node is designed to encode text inputs using the CLIP model tailored for the SDXL architecture, enhancing text-based inputs for further processing or generation tasks. It focuses on refining and conditioning the input text to align with aesthetic or specific dimensional requirements, leveraging the advanced capabilities of the CLIP model to interpret and encode textual information in a way that's optimized for high-resolution image synthesis and manipulation.

Input types¶

Required¶

width
- Specifies the width of the output image, influencing the spatial dimensions of the conditioned output.
- Comfy dtype: INT
- Python dtype: int
height
- Specifies the height of the output image, influencing the spatial dimensions of the conditioned output.
- Comfy dtype: INT
- Python dtype: int
res_factor
- Specifies the factor by which the resolution of the output is adjusted, influencing the overall quality and detail of the conditioned output.
- Comfy dtype: INT
- Python dtype: float
text_pos
- The positive text input to be encoded, serving as a key component of the conditioning process to promote certain qualities or themes.
- Comfy dtype: STRING
- Python dtype: str
text_neg
- The negative text input to be encoded, used to demote or reduce the presence of certain qualities or themes in the conditioning process.
- Comfy dtype: STRING
- Python dtype: str
clip
- The CLIP model instance used for text tokenization and encoding, central to the node's functionality.
- Comfy dtype: CLIP
- Python dtype: torch.nn.Module

Output types¶

cond_pos
- Comfy dtype: CONDITIONING
- The conditioned positive output, including encoded text information tailored to promote specified qualities or themes.
- Python dtype: list[dict]
cond_neg
- Comfy dtype: CONDITIONING
- The conditioned negative output, including encoded text information tailored to reduce or demote specified qualities or themes.
- Python dtype: list[dict]

Usage tips¶

Infra type: GPU
Common nodes: unknown

Source code¶

class CLIPTextEncodeSDXL_Plus:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "width": ("INT", {"default": 1024.0, "min": 0, "max": 12288}),
            "height": ("INT", {"default": 1024.0, "min": 0, "max": 12288}),
            "res_factor": ("INT", {"default": 4, "min": 1, "max": 8}),
            "text_pos": ("STRING", {"multiline": True, "default": "", "dynamicPrompts": True}),
            "text_neg": ("STRING", {"multiline": True, "default": "", "dynamicPrompts": True}),
            "clip": ("CLIP", ),
            }}
    RETURN_TYPES = ("CONDITIONING","CONDITIONING",)
    RETURN_NAMES = ("cond_pos", "cond_neg",)
    FUNCTION = "execute"
    CATEGORY = "JPS Nodes/Conditioning"

    def execute(self, clip, width, height, res_factor, text_pos, text_neg):
        crop_w = 0
        crop_h = 0
        width = width*res_factor
        height = height*res_factor
        target_width = width
        target_height = height
        text_g_pos = text_l_pos = text_pos
        text_g_neg = text_l_neg = text_neg

        tokens_pos = clip.tokenize(text_g_pos)
        tokens_pos["l"] = clip.tokenize(text_l_pos)["l"]
        if len(tokens_pos["l"]) != len(tokens_pos["g"]):
            empty_pos = clip.tokenize("")
            while len(tokens_pos["l"]) < len(tokens_pos["g"]):
                tokens_pos["l"] += empty_pos["l"]
            while len(tokens_pos["l"]) > len(tokens_pos["g"]):
                tokens_pos["g"] += empty_pos["g"]
        cond_pos, pooled_pos = clip.encode_from_tokens(tokens_pos, return_pooled=True)

        tokens_neg = clip.tokenize(text_g_neg)
        tokens_neg["l"] = clip.tokenize(text_l_neg)["l"]
        if len(tokens_neg["l"]) != len(tokens_neg["g"]):
            empty_neg = clip.tokenize("")
            while len(tokens_neg["l"]) < len(tokens_neg["g"]):
                tokens_neg["l"] += empty_neg["l"]
            while len(tokens_pos["l"]) > len(tokens_pos["g"]):
                tokens_neg["g"] += empty_neg["g"]
        cond_neg, pooled_neg = clip.encode_from_tokens(tokens_neg, return_pooled=True)

        return ([[cond_pos, {"pooled_output": pooled_pos, "width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}]], [[cond_neg, {"pooled_output": pooled_neg, "width": width, "height": height, "crop_w": crop_w, "crop_h": crop_h, "target_width": target_width, "target_height": target_height}]])