Skip to content

SV3D_Conditioning

Documentation

  • Class name: SV3D_Conditioning
  • Category: conditioning/3d_models
  • Output node: False

The SV3D_Conditioning node is designed to apply specific 3D conditioning transformations to input data, facilitating the generation or manipulation of 3D content. It focuses on enhancing or modifying the 3D aspects of the data, such as spatial dimensions, depth perception, or other 3D-related attributes, to achieve desired outcomes in 3D model generation or adjustment.

Input types

Required

  • clip_vision
    • Represents the visual understanding component, crucial for interpreting and processing the visual aspects of the input data.
    • Comfy dtype: CLIP_VISION
    • Python dtype: str
  • init_image
    • Serves as the initial image input, providing a visual basis for subsequent 3D conditioning transformations.
    • Comfy dtype: IMAGE
    • Python dtype: Image
  • vae
    • Involves a variational autoencoder, essential for encoding and decoding images in the process of generating or modifying 3D content.
    • Comfy dtype: VAE
    • Python dtype: VAE
  • width
    • Specifies the width of the output image, influencing the spatial dimensions of the 3D content.
    • Comfy dtype: INT
    • Python dtype: int
  • height
    • Determines the height of the output image, affecting the spatial dimensions of the 3D content.
    • Comfy dtype: INT
    • Python dtype: int
  • video_frames
    • Defines the number of frames for video content, relevant in scenarios involving 3D video generation or manipulation.
    • Comfy dtype: INT
    • Python dtype: int
  • elevation
    • Controls the elevation angle, impacting the perception of depth and spatial orientation in the 3D content.
    • Comfy dtype: FLOAT
    • Python dtype: float

Output types

  • positive
    • Comfy dtype: CONDITIONING
    • Represents the positive conditioning aspect, enhancing specific desired features or attributes in the 3D content.
    • Python dtype: Conditioning
  • negative
    • Comfy dtype: CONDITIONING
    • Denotes the negative conditioning aspect, suppressing undesired features or attributes in the 3D content.
    • Python dtype: Conditioning
  • latent
    • Comfy dtype: LATENT
    • Encapsulates the latent representation of the 3D content, crucial for various generative or transformative processes.
    • Python dtype: Latent

Usage tips

  • Infra type: CPU
  • Common nodes: unknown

Source code

class SV3D_Conditioning:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": { "clip_vision": ("CLIP_VISION",),
                              "init_image": ("IMAGE",),
                              "vae": ("VAE",),
                              "width": ("INT", {"default": 576, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
                              "height": ("INT", {"default": 576, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
                              "video_frames": ("INT", {"default": 21, "min": 1, "max": 4096}),
                              "elevation": ("FLOAT", {"default": 0.0, "min": -90.0, "max": 90.0, "step": 0.1, "round": False}),
                             }}
    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
    RETURN_NAMES = ("positive", "negative", "latent")

    FUNCTION = "encode"

    CATEGORY = "conditioning/3d_models"

    def encode(self, clip_vision, init_image, vae, width, height, video_frames, elevation):
        output = clip_vision.encode_image(init_image)
        pooled = output.image_embeds.unsqueeze(0)
        pixels = comfy.utils.common_upscale(init_image.movedim(-1,1), width, height, "bilinear", "center").movedim(1,-1)
        encode_pixels = pixels[:,:,:,:3]
        t = vae.encode(encode_pixels)

        azimuth = 0
        azimuth_increment = 360 / (max(video_frames, 2) - 1)

        elevations = []
        azimuths = []
        for i in range(video_frames):
            elevations.append(elevation)
            azimuths.append(azimuth)
            azimuth += azimuth_increment

        positive = [[pooled, {"concat_latent_image": t, "elevation": elevations, "azimuth": azimuths}]]
        negative = [[torch.zeros_like(pooled), {"concat_latent_image": torch.zeros_like(t), "elevation": elevations, "azimuth": azimuths}]]
        latent = torch.zeros([video_frames, 4, height // 8, width // 8])
        return (positive, negative, {"samples":latent})