[Inference.Core] DWPose Estimator¶
Documentation¶
- Class name:
Inference_Core_DWPreprocessor
- Category:
ControlNet Preprocessors/Faces and Poses Estimators
- Output node:
False
The Inference_Core_DWPreprocessor node is designed for preprocessing input data specifically for the DWPose estimation model. It adapts input data to the required format and optimizes it for efficient pose estimation, ensuring compatibility and maximizing the performance of the DWPose model.
Input types¶
Required¶
image
- The 'image' parameter is the primary input for the pose estimation process, serving as the visual data that the model will analyze to estimate poses.
- Comfy dtype:
IMAGE
- Python dtype:
np.ndarray
Optional¶
detect_hand
- The 'detect_hand' parameter controls whether hand detection is enabled or disabled, influencing the comprehensiveness of the pose estimation.
- Comfy dtype:
COMBO[STRING]
- Python dtype:
bool
detect_body
- The 'detect_body' parameter toggles the inclusion of body pose estimation, affecting the scope of the pose analysis performed by the model.
- Comfy dtype:
COMBO[STRING]
- Python dtype:
bool
detect_face
- The 'detect_face' parameter determines whether face detection is included in the pose estimation, impacting the detail level of the pose analysis.
- Comfy dtype:
COMBO[STRING]
- Python dtype:
bool
resolution
- The 'resolution' parameter specifies the resolution of the output image, affecting the clarity and detail of the pose estimation results.
- Comfy dtype:
INT
- Python dtype:
int
bbox_detector
- The 'bbox_detector' parameter specifies the model or method used for bounding box detection, crucial for identifying regions of interest within the image for pose estimation.
- Comfy dtype:
COMBO[STRING]
- Python dtype:
str
pose_estimator
- The 'pose_estimator' parameter defines the specific pose estimation model or technique to be applied, directly influencing the accuracy and performance of the pose estimation.
- Comfy dtype:
COMBO[STRING]
- Python dtype:
str
Output types¶
image
- Comfy dtype:
IMAGE
- The 'image' output provides the visual representation of the pose estimation, including annotated poses on the input image.
- Python dtype:
np.ndarray
- Comfy dtype:
pose_keypoint
- Comfy dtype:
POSE_KEYPOINT
- The 'pose_keypoint' output delivers the estimated poses as a set of keypoints, offering detailed information about the detected poses.
- Python dtype:
List[np.ndarray]
- Comfy dtype:
Usage tips¶
- Infra type:
GPU
- Common nodes: unknown
Source code¶
class DWPose_Preprocessor:
@classmethod
def INPUT_TYPES(s):
input_types = create_node_input_types(
detect_hand=(["enable", "disable"], {"default": "enable"}),
detect_body=(["enable", "disable"], {"default": "enable"}),
detect_face=(["enable", "disable"], {"default": "enable"})
)
input_types["optional"] = {
**input_types["optional"],
"bbox_detector": (
["yolox_l.torchscript.pt", "yolox_l.onnx", "yolo_nas_l_fp16.onnx", "yolo_nas_m_fp16.onnx", "yolo_nas_s_fp16.onnx"],
{"default": "yolox_l.onnx"}
),
"pose_estimator": (["dw-ll_ucoco_384_bs5.torchscript.pt", "dw-ll_ucoco_384.onnx", "dw-ll_ucoco.onnx"], {"default": "dw-ll_ucoco_384_bs5.torchscript.pt"})
}
return input_types
RETURN_TYPES = ("IMAGE", "POSE_KEYPOINT")
FUNCTION = "estimate_pose"
CATEGORY = "ControlNet Preprocessors/Faces and Poses Estimators"
def estimate_pose(self, image, detect_hand, detect_body, detect_face, resolution=512, bbox_detector="yolox_l.onnx", pose_estimator="dw-ll_ucoco_384.onnx", **kwargs):
if bbox_detector == "yolox_l.onnx":
yolo_repo = DWPOSE_MODEL_NAME
elif "yolox" in bbox_detector:
yolo_repo = "hr16/yolox-onnx"
elif "yolo_nas" in bbox_detector:
yolo_repo = "hr16/yolo-nas-fp16"
else:
raise NotImplementedError(f"Download mechanism for {bbox_detector}")
if pose_estimator == "dw-ll_ucoco_384.onnx":
pose_repo = DWPOSE_MODEL_NAME
elif pose_estimator.endswith(".onnx"):
pose_repo = "hr16/UnJIT-DWPose"
elif pose_estimator.endswith(".torchscript.pt"):
pose_repo = "hr16/DWPose-TorchScript-BatchSize5"
else:
raise NotImplementedError(f"Download mechanism for {pose_estimator}")
model = DwposeDetector.from_pretrained(
pose_repo,
yolo_repo,
det_filename=bbox_detector, pose_filename=pose_estimator,
torchscript_device=model_management.get_torch_device()
)
detect_hand = detect_hand == "enable"
detect_body = detect_body == "enable"
detect_face = detect_face == "enable"
self.openpose_dicts = []
def func(image, **kwargs):
pose_img, openpose_dict = model(image, **kwargs)
self.openpose_dicts.append(openpose_dict)
return pose_img
out = common_annotator_call(func, image, include_hand=detect_hand, include_face=detect_face, include_body=detect_body, image_and_json=True, resolution=resolution)
del model
return {
'ui': { "openpose_json": [json.dumps(self.openpose_dicts, indent=4)] },
"result": (out, self.openpose_dicts)
}