Modelscope t2v¶
Documentation¶
- Class name:
Modelscopet2v
- Category:
cspnodes/modelscope
- Output node:
False
The Modelscopet2v node is designed for transforming text inputs into visual outputs, leveraging advanced models to interpret and visualize textual descriptions in a visual format. This node encapsulates the process of text-to-visual conversion, enabling the creation of images or visual representations based on textual input.
Input types¶
Required¶
prompt
- The 'prompt' parameter specifies the textual input that describes the desired visual output. It plays a key role in guiding the model towards generating images that align with the given description, directly influencing the thematic and stylistic aspects of the generated visual content.
- Comfy dtype:
STRING
- Python dtype:
str
negative_prompt
- The 'negative_prompt' parameter provides textual input that describes undesired aspects of the visual output. It helps in refining the generated images by steering the model away from unwanted characteristics, thus fine-tuning the final visual representation.
- Comfy dtype:
STRING
- Python dtype:
str
model_path
- Specifies the path to the model used for generating visual content. This parameter allows for the selection of different models, potentially offering various styles or capabilities.
- Comfy dtype:
STRING
- Python dtype:
str
num_inference_steps
- Determines the number of steps the model will take during the inference process. A higher number of steps can lead to more detailed and coherent visual outputs.
- Comfy dtype:
INT
- Python dtype:
int
guidance_scale
- Controls the degree to which the model adheres to the prompt. A higher guidance scale can result in images that more closely match the provided description.
- Comfy dtype:
FLOAT
- Python dtype:
float
seed
- Sets the random seed for generating visual content, ensuring reproducibility of results.
- Comfy dtype:
INT
- Python dtype:
int
width
- Specifies the width of the generated visual content in pixels.
- Comfy dtype:
INT
- Python dtype:
int
height
- Specifies the height of the generated visual content in pixels.
- Comfy dtype:
INT
- Python dtype:
int
num_frames
- Determines the number of frames to be generated for video content, defining the length of the output video.
- Comfy dtype:
INT
- Python dtype:
int
Output types¶
image
- Comfy dtype:
IMAGE
- This output represents the generated visual content based on the textual descriptions provided as input. It encapsulates the text-to-visual conversion logic, producing images or video frames.
- Python dtype:
torch.Tensor
- Comfy dtype:
Usage tips¶
- Infra type:
GPU
- Common nodes: unknown
Source code¶
class Modelscopet2v:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"prompt": ("STRING", {}),
"negative_prompt": ("STRING", {"default": None}),
"model_path": ("STRING", {"default": "cerspense/zeroscope_v2_576w"}),
"num_inference_steps": ("INT", {"default": 25}),
"guidance_scale": ("FLOAT", {"default": 9.0}),
"seed": ("INT", {"default": 42}),
"width": ("INT", {"default": 576}),
"height": ("INT", {"default": 320}),
"num_frames": ("INT", {"default": 24}),
}
}
RETURN_TYPES = ("IMAGE",)
FUNCTION = "generate_video_frames"
CATEGORY = "cspnodes/modelscope"
def generate_video_frames(self, prompt, model_path, num_inference_steps, height, width, num_frames, guidance_scale, negative_prompt, seed):
# Set up the generator for deterministic results if seed is provided
generator = torch.Generator()
if seed is not None:
generator.manual_seed(seed)
pipe = DiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
# Added generator to the pipe call
video_frames = pipe(prompt, num_inference_steps=num_inference_steps, height=height, width=width, num_frames=num_frames, guidance_scale=guidance_scale, negative_prompt=negative_prompt, generator=generator).frames
# Ensure video_frames is a PyTorch tensor
if not isinstance(video_frames, torch.Tensor):
video_frames = torch.tensor(video_frames, dtype=torch.float32)
# Normalize the tensor to have values between 0 and 1 if they are in the range 0-255
if video_frames.max() > 1.0:
video_frames = video_frames / 255.0
# Remove the unnecessary batch dimension explicitly and permute the dimensions
# The expected shape is (num_frames, height, width, channels)
video_frames = video_frames.squeeze(0).permute(0, 1, 2, 3)
# Convert the tensor to CPU and to uint8 if it's not already
video_frames = video_frames.to('cpu')
# return (video_frames_numpy,)
return (video_frames,)