Skip to content

CLIPTextEncodeSD3

Documentation

  • Class name: CLIPTextEncodeSD3
  • Category: advanced/conditioning
  • Output node: False

The CLIPTextEncodeSD3 node is designed for advanced text conditioning in generative models, focusing on encoding text inputs using the CLIP model. It supports multiple text inputs and an option for empty padding, facilitating the creation of conditioned inputs for generative tasks.

Input types

Required

  • clip
    • The CLIP model instance used for tokenization and encoding of text inputs. It plays a central role in processing the text inputs into a format suitable for conditioning.
    • Comfy dtype: CLIP
    • Python dtype: CLIP
  • clip_l
    • A multiline, dynamically promptable string input representing the local text to be encoded. It affects the generation by providing local context.
    • Comfy dtype: STRING
    • Python dtype: str
  • clip_g
    • A multiline, dynamically promptable string input representing the global text to be encoded. It provides the broader context for the generation task.
    • Comfy dtype: STRING
    • Python dtype: str
  • t5xxl
    • A multiline, dynamically promptable string input for additional text encoding using a T5 model, enhancing the conditioning with another layer of textual context.
    • Comfy dtype: STRING
    • Python dtype: str
  • empty_padding
    • Specifies the padding strategy (none or empty_prompt) for handling empty text inputs, influencing the final encoded output.
    • Comfy dtype: COMBO[STRING]
    • Python dtype: str

Output types

  • conditioning
    • Comfy dtype: CONDITIONING
    • The encoded text output, structured for use in conditioning generative models. It includes both the conditioning tokens and pooled output for comprehensive context.
    • Python dtype: List[Tuple[torch.Tensor, Dict[str, torch.Tensor]]]

Usage tips

  • Infra type: GPU
  • Common nodes: unknown

Source code

class CLIPTextEncodeSD3:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "clip": ("CLIP", ),
            "clip_l": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            "clip_g": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            "t5xxl": ("STRING", {"multiline": True, "dynamicPrompts": True}),
            "empty_padding": (["none", "empty_prompt"], )
            }}
    RETURN_TYPES = ("CONDITIONING",)
    FUNCTION = "encode"

    CATEGORY = "advanced/conditioning"

    def encode(self, clip, clip_l, clip_g, t5xxl, empty_padding):
        no_padding = empty_padding == "none"

        tokens = clip.tokenize(clip_g)
        if len(clip_g) == 0 and no_padding:
            tokens["g"] = []

        if len(clip_l) == 0 and no_padding:
            tokens["l"] = []
        else:
            tokens["l"] = clip.tokenize(clip_l)["l"]

        if len(t5xxl) == 0 and no_padding:
            tokens["t5xxl"] =  []
        else:
            tokens["t5xxl"] = clip.tokenize(t5xxl)["t5xxl"]
        if len(tokens["l"]) != len(tokens["g"]):
            empty = clip.tokenize("")
            while len(tokens["l"]) < len(tokens["g"]):
                tokens["l"] += empty["l"]
            while len(tokens["l"]) > len(tokens["g"]):
                tokens["g"] += empty["g"]
        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
        return ([[cond, {"pooled_output": pooled}]], )