Lecture

Image Segmentation

20 Oct 2022 • Richard Kuo

Image Segmentation includes Image Matting, Semantics Segmentation, Human Part Segmentation, Instance Segmentation, Video Object Segmentation, Panopitc Segmentation.

Image Segmentation Survey

Paper: Image Segmentation Using Deep Learning: A Survey

Paper: Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey

Image Matting

Image Matting is the process of accurately estimating the foreground object in images and videos.

BiseNetV2 model

Four segmentation areas: semantic segmentation, interactive segmentation, panoptic segmentation and image matting. Various applications in autonomous driving, medical segmentation, remote sensing, quality inspection, and other scenarios.

Semantic Image Matting

Paper: arxiv.org/abs/2104.08201
Code: nowsyn/SIM

Semantic Segmentation (意義分割）

FCN - Fully Convolutional Networks

Paper: Fully Convolutional Networks for Semantic Segmentation
Code: https://github.com/hayoung-kim/tf-semantic-segmentation-FCN-VGG16
Blog: FCN for Semantic Segmentation簡介

FCN Architecture FCN-8 Architecture Conv & DeConv

上圖為作者在論文中給出的融合組合。第一列的FCN-32是指將conv7層直接放大32倍的網路；而FCN-16則是將conv7層放大兩倍之後，和pool4做結合再放大16倍的網路，以此類推。

這些網路對應到的成果圖如下圖。可以發現，考慮越多不同尺度的feature map所得到的最終prediction map之精細度也越高，越接近ground-truth。

U-Net

Paper: arxiv.org/abs/1505.04597
Code: U-Net Keras

3D U-Net

Paper: arxiv.org/abs/1606.06650

Brain Tumor Segmentation

Dataset: Brain Tumor Segmentation(BraTS2020)
Code: https://www.kaggle.com/polomarco/brats20-3dunet-3dautoencoder

3D MRI BraTS using AutoEncoder

Paper: 3D MRI brain tumor segmentation using autoencoder regularization

BraTS with 3D U-Net

Paper:Brain tumor segmentation with self-ensembled, deeply-supervised 3D U-net neural networks: a BraTS 2020 challenge solution

Kvasir SEG

Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection.
Dataset: kvasir-seg.zip (1000 images and masks)

HyperKvasir

The Largest Gastrointestinal Dataset.
Dataset: hyper-kvasir.zip

PraNet

Paper: PraNet: Parallel Reverse Attention Network for Polyp Segmentation
Code: DengPingFan/PraNet

TGANet

Paper: TGANet: Text-guided attention for improved polyp segmentation
Code: nikhilroxtomar/TGANet

HarDNet-MSEG: 高效且準確之類神經網路應用於大腸息肉分割

Paper: HarDNet-MSEG: A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS
Code: james128333/HarDNet-MSEG

SegNet - A Deep Convolutional Encoder-Decoder Architecture

Paper: arxiv.org/abs/1511.00561
Code: github.com/yassouali/pytorch_segmentation

PSPNet - Pyramid Scene Parsing Network

Paper: arxiv.org/abs/1612.01105
Code: github.com/hszhao/semseg (PSPNet, PSANet in PyTorch)

DeepLab V3+

Paper: arxiv.org/abs/1802.02611
Code: github.com/bonlime/keras-deeplab-v3-plus

Semantic Segmentation on MIT ADE20K

Code: github.com/CSAILVision/semantic-segmentation-pytorch
Dataset: MIT ADE20K, Models: PSPNet, UPerNet, HRNet

Semantic Segmentation on PyTorch

Code: Tramac/awesome-semantic-segmentation-pytorch
Datasets: Pascal VOC, CityScapes, ADE20K, MSCOCO
Models:

Human Part Segmentation

https://paperswithcode.com/task/human-part-segmentation

Look Into Person Challenge 2020 [LIP]

LIP is the largest single person human parsing dataset with 50000+ images. This dataset focus more on the complicated real scenarios. LIP has 20 labels, including ‘Background’, ‘Hat’, ‘Hair’, ‘Glove’, ‘Sunglasses’, ‘Upper-clothes’, ‘Dress’, ‘Coat’, ‘Socks’, ‘Pants’, ‘Jumpsuits’, ‘Scarf’, ‘Skirt’, ‘Face’, ‘Left-arm’, ‘Right-arm’, ‘Left-leg’, ‘Right-leg’, ‘Left-shoe’, ‘Right-shoe’.

HumanParsing-Dataset [ATR] (passwd：kjgk)

Paper: Human Parsing with Contextualized Convolutional Neural Network

ATR is a large single person human parsing dataset with 17000+ images. This dataset focus more on fashion AI. ATR has 18 labels, including ‘Background’, ‘Hat’, ‘Hair’, ‘Sunglasses’, ‘Upper-clothes’, ‘Skirt’, ‘Pants’, ‘Dress’, ‘Belt’, ‘Left-shoe’, ‘Right-shoe’, ‘Face’, ‘Left-leg’, ‘Right-leg’, ‘Left-arm’, ‘Right-arm’, ‘Bag’, ‘Scarf’.

PASCAL-Part Dataset [PASCAL]

Pascal Person Part is a tiny single person human parsing dataset with 3000+ images. This dataset focus more on body parts segmentation. Pascal Person Part has 7 labels, including ‘Background’, ‘Head’, ‘Torso’, ‘Upper Arms’, ‘Lower Arms’, ‘Upper Legs’, ‘Lower Legs’.

Self Correction Human Parsing

Blog: HumanPartSegmentation : A Machine Learning Model for Segmenting Human Parts
Paper: arxiv.org/abs/1910.09777
Code: PeikeLi/Self-Correction-Human-Parsing

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

Paper: arxiv.org/abs/1907.05193
Code: kevinlin311tw/CDCL-human-part-segmentation

Instance Segmentation (實例分割）

A Survey on Instance Segmentation

Paper: arxiv.org/abs/2007.00047

Mask-RCNN

Paper: arxiv.org/abs/1703.06870
Blog: 理解Mask R-CNN的工作原理

Mask R-CNN 是個兩階段的架構，第一階段掃描圖像並生成proposals(即有可能包含一個目標的區域），第二階段分類提議並生成邊界框和Mask

TensorMask - A Foundation for Dense Object Segmentation

Paper: arxiv.org/abs/1903.12174
Code: TensorMask in Detectron2

PointRend

Paper: PointRend: Image Segmentation as Rendering
Blog: Facebook PointRend: Rendering Image Segmentation Code: Detectron2 PointRend

YOLACT - Real-Time Instance Segmentation

Paper: arxiv.org/abs/1904.02689
YOLACT++: Better Real-time Instance Segmentation
Code: https://github.com/dbolya/yolact
https://www.kaggle.com/rkuo2000/yolact

INSTA YOLO

Paper: arxiv.org/abs/2102.06777

3D Classification & Segmentation

ModelNet - 3D CAD models for objects

PointNet

Paper: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Code: charlesq34/pointnet
Dataset: ModelNet40.zip
Blog: PointNet or The First Neural Network to Handle Directly 3D Point Clouds

Object Part Segmentation Results

PointNet++

Paper: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Code: charlesq34/pointnet2

PCPNet

Paper: PCPNET: Learning Local Shape Properties from Raw Point Clouds

Code: paulguerrero/pcpnet
python eval_pcpnet.py –indir “path/to/dataset” –dataset “dataset.txt” –models “/path/to/model/model_name”

PointCleanNet

Paper: PointCleanNet: Learning to Denoise and Remove Outliers from Dense Point Clouds
Code: mrakotosaon/pointcleannet

Meta-SeL

Paper: Meta-SeL: 3D-model ShapeNet Core Classification using Meta-Semantic Learning
Code: faridghm/Meta-SeL
Dataset: ShapeNetCore

It covers 55 common object categories with about 51,300 unique 3D models.
The 12 object categories of PASCAL 3D+

Video Object Datasets (影像物件資料集)

DAVIS - Densely Annotated VIdeo Segmentation

DAVIS dataset

DAVIS 2017
!wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip
!unzip -q DAVIS-2017-trainval-480p.zip

YTVOS - YouTube Video Object Segmentation

Video Object Segmentation

4000+ high-resolution YouTube videos
90+ semantic categories
7800+ unique objects
190k+ high-quality manual annotations
340+ minutes duration

YTVOS 2019
YTVOS 2018

train.zip
train_all_frames.zip
valid.zip
valid_all_frames.zip
test.zip
test_all_frames.zip

YTVIS - YouTube Video Instance Segmentation

Video Instance Segmentation

2021 version
3,859 high-resolution YouTube videos, 2,985 training videos, 421 validation videos and 453 test videos.
An improved 40-category label set
8,171 unique video instances
232k high-quality manual annotations