Lecture

Pose Estimation

08 Dec 2022 • Richard Kuo

Pose Estimation includes Applications, Body Pose, Head Pose, Hand Pose , Object Pose.

Pose Estimation Applications

健身鏡
AI健身教練
健身新創 Peloton 在這波居家健身浪潮下，以販售主力產品飛輪、跑步機搭配線上課程，並將健身教練打造成「網紅」，用心拍攝運動影片，成功創造粉絲經濟。
馬術治療
Pose-controlled Lights
跌倒偵測

產線SOP
以雅文塑膠來說，產線作業員的動作僅集中於上半身，以頭部、頸部、肩膀、手臂、手掌的動作為主。Beseye_alpha 針對需求，複製日本大型製造工廠 AI 模型開發的成功案例、及與客戶多次討論需求、實地作業工作站規劃、實際場域測試資料訓練，開發出一個「肢體律動分析」模型，有效達到降低運算量的目標。
其他應用

Body Pose

Ref. A 2019 Guide to Huamn Pose Estimatioin

BodyPix - Person Segmentation in the Browser

Code: tfjs-models/body-pix
pip install tf_bodypix Live Demo

OpenPose

Paper: arxiv.org/abs/1812.08008
Code: CMU-Perceptual-Computing-Lab/openpose
Ref. A Guide to OpenPose in 2021

PoseNet

PoseNet is built to run on lightweight devices such as the browser or mobile device where as
OpenPose is much more accurate and meant to be ran on GPU powered systems. You can see the performance benchmarks below.

Paper: arxiv.org/abs/1505.07427
Code: rwightman/posenet-pytorch

Pose Recognition

using Pose keypoints as dataset to train a DNN
Code: burningion/dab-and-tpose-controlled-lights
IPYNB: pose-control-lights

MMPose

Code: open-mmlab
Model Zoo

DensePose RCNN

Paper: arxiv.org/abs/1802.00434
Code: facebookresearch/DensePose

Region-based DensePose architecture

Multi-task cascaded architectures

Multi-Person Part Segmentation

Paper: arxiv.org/abs/1907.05193
Code: kevinlin311tw/CDCL-human-part-segmentation

Head Pose

Head Pose Estimation

Code:yinguobing/head-pose-estimation

Hopenet

Paper: Fine-Grained Head Pose Estimation Without Keypoints

Code: deep-head-pose

Code: hopenet
Blog: HOPE-Net : A Machine Learning Model for Estimating Face Orientation

VTuber

Vtuber總數突破16000人，增速不緩一年增加3000人依據日本數據調查分析公司 User Local 的報告，在該社最新的 User Local VTuber 排行榜上，有紀錄的 Vtuber 正式突破了 16,000 人。

1位 Gawr Gura(がうるぐらサメちゃん) Gawr Gura Ch. hololive-EN

2位キズナアイ A.I.Channel

3位 Mori Calliope(森カリオペ) Mori Calliope Ch.

VTuber-Unity = Head-Pose-Estimation + Face-Alignment + GazeTracking**

VRoid Studio

VTuber_Unity

OpenVtuber

Hand Pose

Hand Pose Estimation papers

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Paper: arxiv.org/abs/2008.08324
Code: facebookresearch/frankmocap

A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Paper: arxiv.org/abs/2109.11399

Towards unconstrained joint hand-object reconstruction from RGB videos

Paper: arxiv.org/abs/2108.07044
Code: hassony2/homan

Fast Monocular Hand Pose Estimation on Embedded Systems

Paper: arxiv.org/abs/2102.07067

Recent Advances in 3D Object and Hand Pose Estimation

Paper: arxiv.org/abs/2006.05927

Object Pose

Benchmark for 6D Object Pose

Core datasets:


LM-O	T-LESS	TUD-L	IC-BIN	ITODD	HB	YCB-V

Other datasets: LM, RU-APC, IC-MI, TYO-L.

Real-Time Seamless Single Shot 6D Object Pose Prediction (YOLO-6D)

Paper: arxiv.org/abs/1711.08848
Code: microsoft/singleshotpose

PoseCNN

Paper: arxiv.org/abs/1711.00199
Code: yuxng/PoseCNN

DeepIM

Paper: arxiv.org/abs/1804.00175
Code: liyi14/mx-DeepIM

Segmentation-driven Pose

Paper: arxiv.org/abs/1812.02541
Code: cvlab-epfl/segmentation-driven-pose

DPOD

Paper: arxiv.org/abs/1902.11020
Code: yashs97/DPOD

HO-3D_v3 Dataset

Paper: arxiv.org/abs/2107.00887
Github: shreyashampali/ho3d
HO-3D is a dataset with 3D pose annotations for hand and object under severe occlusions from each other.

Exercises of Pose Estimation

BodyPix

Kaggle: rkuo2000/BodyPix

PoseNet

Kaggle: rkuo2000/posenet-pytorch
Kaggle: rkuo2000/posenet-human-pose

MMPose

Kaggle:rkuo2000/MMPose

2D Human Pose

2D Human Whole-Body

2D Hand Pose

2D Face Keypoints

3D Human Pose

2D Pose Tracking

2D Animal Pose

3D Hand Pose

WebCam Effect

OpenPose

Kaggle: rkuo2000/openpose-pytorch

Pose Recognition (姿態辨識)

專題實作步驟:

建立身體動作之姿態照片資料集 (例如：5 poses , take 20 pictures of each pose)
始用MMPose 辨識出照片中的各姿勢之身體關鍵點 (use MMPose convert 16 keypoints (x,y) of each pose)
產生姿態關鍵點資料集 x_train.append(pose_keypoints) ( x_train.shape = (20x5, 16, 2), y_train.shape= (20x5, 1) )
建立DNN模型並訓練模型, 然後下載模型檔pose_dnn.h5至PC
於PC建立帶camera輸入之服務器程式, 載入模型pose_dnn.h5進行姿態動作辨識

模型建構與訓練之程式樣本 (PC or Kaggle)

input_shape=(16,2)
num_classes=5

inputs = layers.Input(shape=input_shape)
x = layers.Dense(128)(inputs)
outputs = layers.Dense(num_classes, activation="softmax")(x)
model = models.Model(inputs=inputs, outputs=outputs)

models.compile(loss = 'categorical_crossentropy', optimizer = 'adam' , metrics = ['accuracy'])

history = model.fit(x_train, y_train, batch_size=1, epochs=20, validation_data=(x_test, y_test))
models.save_model(model, 'pose_dnn.h5')

姿態辨識服務器之程式樣本 (PC with Camera)

model = models.load_model('models/pose_dnn.h5')
labels = ['stand', 'raise-right-arm', 'raise-left-arm', 'cross arms','both-arms-left']

cap = cv2.VideoCapture(0)

while(cap.isOpened()):
    ret, frame = cap.read()
    image = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
    
    mmdet_results = inference_detector(det_model, image) # 人物偵測產生BBox
    person_results = process_mmdet_results(mmdet_results, args.det_cat_id) # 記住人物之BBox  
    pose_results, returned_outputs = inference_top_down_pose_model(...) # 感測姿態產生pose keypoints
    
    x_test = np.array(preson_results).reshape(1,16,2) # 將Keypoints List 轉成 numpy Array
    preds = model.fit(x_test) # 辨識姿態動作
    maxindex = int(np.argmax(preds))
    txt = labels[maxindex]
    print(txt)

Head Pose Estimation

Kaggle: rkuo2000/head-pose-estimation

VTuber-Unity

Head-Pose-Estimation + Face-Alignment + GazeTracking

Build-up Steps:

Create a character: VRoid Studio
Synchronize the face: VTuber_Unity
Take video: OBS Studio
Post-processing:
- Auto-subtitle: Autosub
- Auto-subtitle in live stream: Unity_live_caption
- Encode the subtitle into video: 小丸工具箱
Upload: YouTube
[Optional] Install CUDA & CuDNN to enable GPU acceleration
To Run
$git clone https://github.com/kwea123/VTuber_Unity
$python demo.py --debug --cpu

OpenVtuber

Build-up Steps:

Repro Github
$git clone https://github.com/1996scarlet/OpenVtuber
$cd OpenVtuber
$pip3 install –r requirements.txt
Install node.js for Windows
run Socket-IO Server
$cd NodeServer
$npm install express socket.io
$node. index.js
Open a browser at http://127.0.0.1:6789/kizuna
PythonClient with Webcam
$cd ../PythonClient
$python3 vtuber_link_start.py

Hand Pose

Dataset: InterHand2.6M

Download pre-trained InterNet from here
Put the model at demo folder
Go to demo folder and edit bbox in here
run python demo.py --gpu 0 --test_epoch 20
You can see result_2D.jpg and 3D viewer.

Camera positios visualization demo

cd tool/camera_visualize
Run python camera_visualize.py

This site was last updated December 22, 2022.

Pose Estimation

Pose Estimation Applications

Body Pose

BodyPix - Person Segmentation in the Browser

OpenPose

PoseNet

Pose Recognition

MMPose

DensePose RCNN

Multi-Person Part Segmentation

Head Pose

Head Pose Estimation

Hopenet

VTuber

VTuber-Unity = Head-Pose-Estimation + Face-Alignment + GazeTracking**

VRoid Studio

VTuber_Unity

OpenVtuber

Hand Pose

Hand3D

DeeHPS

GraphPoseGAN

3D Hand Shape

FaceBook InterHand2.6M

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Towards unconstrained joint hand-object reconstruction from RGB videos

Fast Monocular Hand Pose Estimation on Embedded Systems

Recent Advances in 3D Object and Hand Pose Estimation

Object Pose

Benchmark for 6D Object Pose

Real-Time Seamless Single Shot 6D Object Pose Prediction (YOLO-6D)

PoseCNN

DeepIM

Segmentation-driven Pose

DPOD

HO-3D_v3 Dataset

Exercises of Pose Estimation

BodyPix

PoseNet

MMPose

2D Human Pose

2D Human Whole-Body

2D Hand Pose

2D Face Keypoints

3D Human Pose

2D Pose Tracking

2D Animal Pose

3D Hand Pose

WebCam Effect

OpenPose

Pose Recognition (姿態辨識)

Head Pose Estimation

VTuber-Unity

OpenVtuber

Hand Pose