Skip to content

IPAdapter Encoder

Documentation

  • Class name: IPAdapterEncoder
  • Category: ipadapter/embeds
  • Output node: False

The IPAdapterEncoder node is designed to encode images with specific adaptations, leveraging additional parameters such as weight and mask to fine-tune the encoding process. It aims to enhance image processing tasks by integrating clip vision capabilities and custom adaptations.

Input types

Required

  • ipadapter
    • Represents the IPAdapter instance to be used for encoding, determining the specific adaptation techniques applied to the image.
    • Comfy dtype: IPADAPTER
    • Python dtype: CustomIPAdapterType
  • image
    • The image to be encoded, serving as the primary input for the adaptation process.
    • Comfy dtype: IMAGE
    • Python dtype: ImageType
  • weight
    • A weight factor that influences the encoding process, allowing for fine-tuning of the adaptation effects on the image.
    • Comfy dtype: FLOAT
    • Python dtype: float

Optional

  • mask
    • An optional mask that can be applied to the image, enabling selective encoding of certain image regions.
    • Comfy dtype: MASK
    • Python dtype: Optional[ImageType]
  • clip_vision
    • An optional parameter to incorporate clip vision features into the encoding, enhancing the adaptation with vision-based insights.
    • Comfy dtype: CLIP_VISION
    • Python dtype: Optional[ClipVisionType]

Output types

  • pos_embed
    • Comfy dtype: EMBEDS
    • The positive embedding result of the encoding process.
    • Python dtype: EmbeddingType
  • neg_embed
    • Comfy dtype: EMBEDS
    • The negative embedding result of the encoding process.
    • Python dtype: EmbeddingType

Usage tips

  • Infra type: CPU
  • Common nodes:
    • IPAdapterApplyEncoded

Source code

class IPAdapterEncoder:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": {
            "ipadapter": ("IPADAPTER",),
            "image": ("IMAGE",),
            "weight": ("FLOAT", { "default": 1.0, "min": -1.0, "max": 3.0, "step": 0.01 }),
            },
            "optional": {
                "mask": ("MASK",),
                "clip_vision": ("CLIP_VISION",),
            }
        }

    RETURN_TYPES = ("EMBEDS", "EMBEDS",)
    RETURN_NAMES = ("pos_embed", "neg_embed",)
    FUNCTION = "encode"
    CATEGORY = "ipadapter/embeds"

    def encode(self, ipadapter, image, weight, mask=None, clip_vision=None):
        if 'ipadapter' in ipadapter:
            ipadapter_model = ipadapter['ipadapter']['model']
            clip_vision = clip_vision if clip_vision is not None else ipadapter['clipvision']['model']
        else:
            ipadapter_model = ipadapter
            clip_vision = clip_vision

        if clip_vision is None:
            raise Exception("Missing CLIPVision model.")

        is_plus = "proj.3.weight" in ipadapter_model["image_proj"] or "latents" in ipadapter_model["image_proj"] or "perceiver_resampler.proj_in.weight" in ipadapter_model["image_proj"]

        # resize and crop the mask to 224x224
        if mask is not None and mask.shape[1:3] != torch.Size([224, 224]):
            mask = mask.unsqueeze(1)
            transforms = T.Compose([
                T.CenterCrop(min(mask.shape[2], mask.shape[3])),
                T.Resize((224, 224), interpolation=T.InterpolationMode.BICUBIC, antialias=True),
            ])
            mask = transforms(mask).squeeze(1)
            #mask = T.Resize((image.shape[1], image.shape[2]), interpolation=T.InterpolationMode.BICUBIC, antialias=True)(mask.unsqueeze(1)).squeeze(1)

        img_cond_embeds = encode_image_masked(clip_vision, image, mask)

        if is_plus:
            img_cond_embeds = img_cond_embeds.penultimate_hidden_states
            img_uncond_embeds = encode_image_masked(clip_vision, torch.zeros([1, 224, 224, 3])).penultimate_hidden_states
        else:
            img_cond_embeds = img_cond_embeds.image_embeds
            img_uncond_embeds = torch.zeros_like(img_cond_embeds)

        if weight != 1:
            img_cond_embeds = img_cond_embeds * weight

        return (img_cond_embeds, img_uncond_embeds, )