Skip to content

Google Translate CLIP Text Encode Node

Documentation

  • Class name: GoogleTranslateCLIPTextEncodeNode
  • Category: AlekPet Nodes/conditioning
  • Output node: False

This node leverages the Google Translate API to translate text from one language to another and then encodes the translated text using the CLIP model for further processing or analysis. It supports automatic language detection, manual translation control, and integrates CLIP's powerful text and image understanding capabilities to produce conditioning vectors and pooled outputs for downstream tasks.

Input types

Required

  • from_translate
    • Specifies the source language for translation or 'auto' for automatic language detection. It plays a crucial role in guiding the translation process.
    • Comfy dtype: COMBO[STRING]
    • Python dtype: Union[str, List[str]]
  • to_translate
    • Defines the target language for the translation, with 'en' (English) as the default. This parameter determines the language into which the text will be translated.
    • Comfy dtype: COMBO[STRING]
    • Python dtype: List[str]
  • manual_translate
    • A boolean flag that, when set to True, bypasses the translation process and uses the original text for CLIP encoding. This allows for optional use of the translation feature.
    • Comfy dtype: COMBO[BOOLEAN]
    • Python dtype: bool
  • text
    • The text to be translated and/or encoded. This is the primary input for the translation and encoding process.
    • Comfy dtype: STRING
    • Python dtype: str
  • clip
    • A CLIP model instance used for encoding the translated text into vectors. This enables the integration of text understanding and image recognition capabilities.
    • Comfy dtype: CLIP
    • Python dtype: CLIP

Output types

  • conditioning
    • Comfy dtype: CONDITIONING
    • A vector representation of the translated text, suitable for conditioning models or further analysis.
    • Python dtype: List[Tuple[torch.Tensor, Dict[str, torch.Tensor]]]
  • string
    • Comfy dtype: STRING
    • The translated text, providing the outcome of the translation process.
    • Python dtype: str

Usage tips

  • Infra type: GPU
  • Common nodes: unknown

Source code

class GoogleTranslateCLIPTextEncodeNode:

    @classmethod
    def INPUT_TYPES(self):
        return {
            "required": {
                "from_translate": (
                    ["auto"] + list(LANGUAGES.keys()),
                    {"default": "auto"},
                ),
                "to_translate": (list(LANGUAGES.keys()), {"default": "en"}),
                "manual_translate": ([True, False],),
                "text": ("STRING", {"multiline": True, "placeholder": "Input prompt"}),
                "clip": ("CLIP",),
            }
        }

    RETURN_TYPES = (
        "CONDITIONING",
        "STRING",
    )
    FUNCTION = "translate_text"
    CATEGORY = "AlekPet Nodes/conditioning"

    def translate_text(self, **kwargs):
        from_translate = kwargs.get("from_translate")
        to_translate = kwargs.get("to_translate")
        manual_translate = kwargs.get("manual_translate", False)
        text = kwargs.get("text")
        clip = kwargs.get("clip")

        text_tranlsated = (
            translate(text, from_translate, to_translate)
            if not manual_translate
            else text
        )
        tokens = clip.tokenize(text_tranlsated)
        cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
        return ([[cond, {"pooled_output": pooled}]], text_tranlsated)