CreateAudioMask (Deprecated)¶
Documentation¶
- Class name:
CreateAudioMask
- Category:
KJNodes/deprecated
- Output node:
False
The CreateAudioMask node is designed to generate visual masks from audio data. It utilizes the audio's spectrogram to create circular masks based on the amplitude of the audio frames, allowing for dynamic visual representations that correspond to audio intensity.
Input types¶
Required¶
invert
- A boolean flag that, when true, inverts the color of the masks, offering a visual contrast option.
- Comfy dtype:
BOOLEAN
- Python dtype:
bool
frames
- Specifies the number of frames to generate masks for, affecting the batch size and the number of output masks.
- Comfy dtype:
INT
- Python dtype:
int
scale
- A scaling factor for the mask's size, allowing for adjustment of the visual representation's intensity.
- Comfy dtype:
FLOAT
- Python dtype:
float
audio_path
- The file path to the audio data used for generating masks, central to determining the audio's spectrogram.
- Comfy dtype:
STRING
- Python dtype:
str
width
- Determines the width of the generated masks, directly influencing the dimensions of the output images.
- Comfy dtype:
INT
- Python dtype:
int
height
- Sets the height of the generated masks, directly influencing the dimensions of the output images.
- Comfy dtype:
INT
- Python dtype:
int
Output types¶
image
- Comfy dtype:
IMAGE
- The generated visual masks as tensors, suitable for dynamic visual representations that correspond to audio intensity.
- Python dtype:
torch.Tensor
- Comfy dtype:
Usage tips¶
- Infra type:
GPU
- Common nodes: unknown
Source code¶
class CreateAudioMask:
def __init__(self):
try:
import librosa
self.librosa = librosa
except ImportError:
print("Can not import librosa. Install it with 'pip install librosa'")
RETURN_TYPES = ("IMAGE",)
FUNCTION = "createaudiomask"
CATEGORY = "KJNodes/deprecated"
@classmethod
def INPUT_TYPES(s):
return {
"required": {
"invert": ("BOOLEAN", {"default": False}),
"frames": ("INT", {"default": 16,"min": 1, "max": 255, "step": 1}),
"scale": ("FLOAT", {"default": 0.5,"min": 0.0, "max": 2.0, "step": 0.01}),
"audio_path": ("STRING", {"default": "audio.wav"}),
"width": ("INT", {"default": 256,"min": 16, "max": 4096, "step": 1}),
"height": ("INT", {"default": 256,"min": 16, "max": 4096, "step": 1}),
},
}
def createaudiomask(self, frames, width, height, invert, audio_path, scale):
# Define the number of images in the batch
batch_size = frames
out = []
masks = []
if audio_path == "audio.wav": #I don't know why relative path won't work otherwise...
audio_path = os.path.join(script_directory, audio_path)
audio, sr = self.librosa.load(audio_path)
spectrogram = np.abs(self.librosa.stft(audio))
for i in range(batch_size):
image = Image.new("RGB", (width, height), "black")
draw = ImageDraw.Draw(image)
frame = spectrogram[:, i]
circle_radius = int(height * np.mean(frame))
circle_radius *= scale
circle_center = (width // 2, height // 2) # Calculate the center of the image
draw.ellipse([(circle_center[0] - circle_radius, circle_center[1] - circle_radius),
(circle_center[0] + circle_radius, circle_center[1] + circle_radius)],
fill='white')
image = np.array(image).astype(np.float32) / 255.0
image = torch.from_numpy(image)[None,]
mask = image[:, :, :, 0]
masks.append(mask)
out.append(image)
if invert:
return (1.0 - torch.cat(out, dim=0),)
return (torch.cat(out, dim=0),torch.cat(masks, dim=0),)