IPAdapter Encoder¶
Documentation¶
- Class name:
IPAdapterEncoder
- Category:
ipadapter/embeds
- Output node:
False
The IPAdapterEncoder node is designed to encode images with specific adaptations, leveraging additional parameters such as weight and mask to fine-tune the encoding process. It aims to enhance image processing tasks by integrating clip vision capabilities and custom adaptations.
Input types¶
Required¶
ipadapter
- Represents the IPAdapter instance to be used for encoding, determining the specific adaptation techniques applied to the image.
- Comfy dtype:
IPADAPTER
- Python dtype:
CustomIPAdapterType
image
- The image to be encoded, serving as the primary input for the adaptation process.
- Comfy dtype:
IMAGE
- Python dtype:
ImageType
weight
- A weight factor that influences the encoding process, allowing for fine-tuning of the adaptation effects on the image.
- Comfy dtype:
FLOAT
- Python dtype:
float
Optional¶
mask
- An optional mask that can be applied to the image, enabling selective encoding of certain image regions.
- Comfy dtype:
MASK
- Python dtype:
Optional[ImageType]
clip_vision
- An optional parameter to incorporate clip vision features into the encoding, enhancing the adaptation with vision-based insights.
- Comfy dtype:
CLIP_VISION
- Python dtype:
Optional[ClipVisionType]
Output types¶
pos_embed
- Comfy dtype:
EMBEDS
- The positive embedding result of the encoding process.
- Python dtype:
EmbeddingType
- Comfy dtype:
neg_embed
- Comfy dtype:
EMBEDS
- The negative embedding result of the encoding process.
- Python dtype:
EmbeddingType
- Comfy dtype:
Usage tips¶
- Infra type:
CPU
- Common nodes:
- IPAdapterApplyEncoded
Source code¶
class IPAdapterEncoder:
@classmethod
def INPUT_TYPES(s):
return {"required": {
"ipadapter": ("IPADAPTER",),
"image": ("IMAGE",),
"weight": ("FLOAT", { "default": 1.0, "min": -1.0, "max": 3.0, "step": 0.01 }),
},
"optional": {
"mask": ("MASK",),
"clip_vision": ("CLIP_VISION",),
}
}
RETURN_TYPES = ("EMBEDS", "EMBEDS",)
RETURN_NAMES = ("pos_embed", "neg_embed",)
FUNCTION = "encode"
CATEGORY = "ipadapter/embeds"
def encode(self, ipadapter, image, weight, mask=None, clip_vision=None):
if 'ipadapter' in ipadapter:
ipadapter_model = ipadapter['ipadapter']['model']
clip_vision = clip_vision if clip_vision is not None else ipadapter['clipvision']['model']
else:
ipadapter_model = ipadapter
clip_vision = clip_vision
if clip_vision is None:
raise Exception("Missing CLIPVision model.")
is_plus = "proj.3.weight" in ipadapter_model["image_proj"] or "latents" in ipadapter_model["image_proj"] or "perceiver_resampler.proj_in.weight" in ipadapter_model["image_proj"]
# resize and crop the mask to 224x224
if mask is not None and mask.shape[1:3] != torch.Size([224, 224]):
mask = mask.unsqueeze(1)
transforms = T.Compose([
T.CenterCrop(min(mask.shape[2], mask.shape[3])),
T.Resize((224, 224), interpolation=T.InterpolationMode.BICUBIC, antialias=True),
])
mask = transforms(mask).squeeze(1)
#mask = T.Resize((image.shape[1], image.shape[2]), interpolation=T.InterpolationMode.BICUBIC, antialias=True)(mask.unsqueeze(1)).squeeze(1)
img_cond_embeds = encode_image_masked(clip_vision, image, mask)
if is_plus:
img_cond_embeds = img_cond_embeds.penultimate_hidden_states
img_uncond_embeds = encode_image_masked(clip_vision, torch.zeros([1, 224, 224, 3])).penultimate_hidden_states
else:
img_cond_embeds = img_cond_embeds.image_embeds
img_uncond_embeds = torch.zeros_like(img_cond_embeds)
if weight != 1:
img_cond_embeds = img_cond_embeds * weight
return (img_cond_embeds, img_uncond_embeds, )