Skip to content

Human4D Image2SMPL

Documentation

  • Class name: Human4D_Img2SMPL
  • Category: MotionDiff
  • Output node: False

The Human4D_Img2SMPL node is designed to transform 2D human images into 3D representations using the SMPL model. It leverages deep learning models to detect human figures in images, estimate their poses, and generate corresponding 3D mesh models, enabling advanced motion analysis and visualization.

Input types

Required

  • human4d_model
    • The human4D model encapsulates the necessary configurations and models for detecting humans in images and generating their 3D SMPL representations. It plays a crucial role in the node's ability to accurately process and transform 2D images into 3D models.
    • Comfy dtype: HUMAN4D_MODEL
    • Python dtype: SimpleNamespace
  • image
    • The input image tensor containing human figures to be transformed into 3D SMPL models. This tensor is critical for the node to perform human detection and pose estimation.
    • Comfy dtype: IMAGE
    • Python dtype: torch.Tensor
  • det_confidence_thresh
    • The confidence threshold for human detection. This parameter helps in filtering out detections with low confidence, ensuring that only high-confidence human figures are processed for 3D modeling.
    • Comfy dtype: FLOAT
    • Python dtype: float
  • det_iou_thresh
    • The Intersection Over Union (IOU) threshold for human detection. It is used to manage the overlap between detected bounding boxes, improving the precision of human detection.
    • Comfy dtype: FLOAT
    • Python dtype: float
  • det_batch_size
    • The batch size for processing detections. This parameter affects the throughput and efficiency of the human detection process, balancing between speed and memory usage.
    • Comfy dtype: INT
    • Python dtype: int
  • hmr_batch_size
    • The batch size for the HMR (Human Mesh Recovery) process. It determines how many human figures are processed simultaneously for 3D modeling, impacting the node's performance and resource utilization.
    • Comfy dtype: INT
    • Python dtype: int

Optional

  • opt_scorehmr_refiner
    • An optional parameter for refining the scores from the HMR process. If provided, it enhances the accuracy of the 3D SMPL models generated by the node.
    • Comfy dtype: SCORE_HMR_MODEL
    • Python dtype: Optional[Callable]

Output types

  • smpl_multiple_subjects
    • Comfy dtype: SMPL_MULTIPLE_SUBJECTS
    • The output is a comprehensive 3D representation of multiple human subjects derived from 2D images, including mesh models, pose information, and additional metadata for advanced motion analysis.
    • Python dtype: Tuple[List[torch.Tensor], Dict]

Usage tips

  • Infra type: GPU
  • Common nodes: unknown

Source code

class Human4D_Img2SMPL:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "human4d_model": ("HUMAN4D_MODEL", ),
                "image": ("IMAGE",),
                "det_confidence_thresh": ("FLOAT", {"min": 0.1, "max": 1, "step": 0.05, "default": 0.25}),
                "det_iou_thresh": ("FLOAT", {"min": 0.1, "max": 1, "step": 0.05, "default": 0.7}),
                "det_batch_size": ("INT", {"min": 1, "max": 20, "default": 10}),
                "hmr_batch_size": ("INT", {"min": 1, "max": 20, "default": 8})
            },
            "optional": {
                "opt_scorehmr_refiner": ("SCORE_HMR_MODEL", )
            }
        }

    RETURN_TYPES = ("SMPL_MULTIPLE_SUBJECTS", )
    FUNCTION = "sample"
    CATEGORY = "MotionDiff"

    def get_boxes(self, detector, image, batch_size, **kwargs):
        boxes_images = []
        for img_batch in tqdm(DataLoader(image, shuffle=False, batch_size=batch_size, num_workers=0)):
            det_results = detector.predict([img.numpy() for img in img_batch], classes=[0], **kwargs)
            boxes_images.extend([det_result.boxes.xyxy.cpu().numpy() for det_result in det_results])
        return boxes_images

    def sample(self, human4d_model, image, det_confidence_thresh, det_iou_thresh, det_batch_size, hmr_batch_size, opt_scorehmr_refiner=None):
        models = human4d_model
        if opt_scorehmr_refiner is not None:
            raise NotImplementedError()
        image = image.__mul__(255.).to(torch.uint8)
        boxes_images = self.get_boxes(models.detector, image, conf=det_confidence_thresh, iou=det_iou_thresh, batch_size=det_batch_size)
        verts_frames = []
        cam_t_frames = []
        kps_2d_frames = []
        pbar = comfy.utils.ProgressBar(len(image))
        for img_pt, boxes in tqdm(zip(image, boxes_images)):
            img_cv2 = img_pt.numpy()[:, :, ::-1].copy()

            # Run HMR2.0 on all detected humans
            dataset = ViTDetDataset(models.model_cfg, img_cv2, boxes)
            dataloader = torch.utils.data.DataLoader(dataset, batch_size=hmr_batch_size, shuffle=False, num_workers=0)
            _all_verts = []
            _all_kps_2d = []

            for batch in dataloader:
                batch = recursive_to(batch, get_torch_device())
                if models.fp16:
                    batch = recursive_to(batch, torch.float16)
                with torch.no_grad():
                    out = models.human4d(batch)

                pred_cam = out['pred_cam']
                box_center = batch["box_center"].float()
                box_size = batch["box_size"].float()
                img_size = batch["img_size"].float()
                scaled_focal_length = models.model_cfg.EXTRA.FOCAL_LENGTH / models.model_cfg.MODEL.IMAGE_SIZE * img_size.max()
                pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu()

                batch_size = batch['img'].shape[0]
                for n in range(batch_size):
                    verts = out['pred_vertices'][n].detach().cpu() #Shape [num_verts, 3]
                    cam_t = pred_cam_t_full[n] # Shape [3]
                    kps_2d = out['pred_keypoints_2d'][n].detach().cpu() #Shape [44, 3]
                    verts = torch.from_numpy(vertices_to_trimesh(verts, cam_t.unsqueeze(0)).vertices)
                    _all_verts.append(verts)
                    _all_kps_2d.append(kps_2d)

            if len(_all_verts):
                verts_frames.append(
                    torch.stack(_all_verts) #Shape [num_subjects, num_verts, 3]
                )
                kps_2d_frames.append(
                    torch.stack(_all_kps_2d) #Shape [num_subjects, 44, 3]
                )
            else:
                verts_frames.append(None)
                cam_t_frames.append(None)
                kps_2d_frames.append(None)
            pbar.update(1)
        verts_frames #List of [num_subjects, num_verts, 3]
        kps_2d_frames #List of [num_subjects, 44, 3]
        rot2xyz = Rotation2xyz(device="cpu", smpl_model_path=smpl_models_dict["SMPL_NEUTRAL.pkl"])
        faces = rot2xyz.smpl_model.faces

        return ((
            verts_frames, 
            {"faces": faces, "normalized_to_vertices": True, 'cam': cam_t_frames, 
            "frame_width": int(img_size[0, 0].item()), "frame_height": int(img_size[0, 1].item()), 
            "focal_length": scaled_focal_length, 
            "render_openpose": partial(render_openpose, kps_2d_frames, boxes_images, int(img_size[0, 0].item()), int(img_size[0, 1].item()))}
            # In Comfy, IMAGE is a batched Tensor so all frames always share the same size
        ), )