Lecture

Generative Adversarial Networks

27 Oct 2022 • Richard Kuo

This introduction includes Style Transfer, VAE, DALL-E, GAN, Image Inpainting, DeepFaceDrawing, Toonify, PoseGAN, Pose2Pose, Virtual Try On, Face Swap, Talking Head, VQ-VAE, Music Seperation, Deep Singer, Voice Conversion.

Style Transfer

DeepDream

Nerual Style Transfer

Paper: A Neural Algorithm of Artistic Style
Code: ProGamerGov/neural-style-pt
Tutorial: Neural Transfer using PyTorch

Fast Style Transfer

Paper:A Neural Algorithm of Artistic Style & Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Code: lengstrom/fast-style-transfer

Variational AutoEncoder

VAE

Blog: VAE(Variational AutoEncoder) 實作
Paper: Auto-Encoding Variational Bayes
Code: rkuo2000/fashionmnist-vae

Arbitrary Style Transfer

Paper: Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Code: elleryqueenhomels/arbitrary_style_transfer

zi2zi

Blog: zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks
Paper: Generating Handwritten Chinese Characters using CycleGAN
Code: kaonashi-tyc/zi2zi

DALL.E

Blog: https://openai.com/blog/dall-e/
DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.
Paper: Zero-Shot Text-to-Image Generation
Code: openai/DALL-E

DALL.E-2

News: An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.
DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.
Blog: How DALL-E 2 Actually Works
“a bowl of soup that is a portal to another dimension as digital art”.
Paper: Hierarchical Text-Conditional Image Generation with CLIP Latents

Stable Diffusion

Dataset: LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS
5,85 billion CLIP-filtered image-text pairs
Stable Diffusion is a latent text-to-image diffusion model.
Demo: Stable Diffusion Image Generator
Image Modification with Stable Diffusion
- Input:
- Output:

Paper: High-Resolution Image Synthesis with Latent Diffusion Models
Code: https://github.com/CompVis/latent-diffusion

Paper: Semi-Parametric Neural Image Synthesis

GAN

Blog: A Beginner’s Guide to Generative Adversarial Networks (GANs)
Paper: Generative Adversarial Networks
G是生成的神經網路，它接收一個隨機的噪訊z，通過這個噪訊生成圖片，為G(z)
D是辨别的神經網路，辨别一張圖片夠不夠真實。它的輸入參數是x，x代表一張圖片，輸出D(x)代表x為真實圖片的機率

class GAN():
    def __init__(self):
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy',
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build and compile the generator
        self.generator = self.build_generator()
        self.generator.compile(loss='binary_crossentropy', optimizer=optimizer)

        # The generator takes noise as input and generated imgs
        z = Input(shape=(100,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The valid takes generated images as input and determines validity
        valid = self.discriminator(img)

        # The combined model  (stacked generator and discriminator) takes
        # noise as input => generates images => determines validity
        self.combined = Model(z, valid)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)

    def build_generator(self):

        noise_shape = (100,)

        model = Sequential()

        model.add(Dense(256, input_shape=noise_shape))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        model.summary()

        noise = Input(shape=noise_shape)
        img = model(noise)

        return Model(noise, img)

    def build_discriminator(self):

        img_shape = (self.img_rows, self.img_cols, self.channels)

        model = Sequential()

        model.add(Flatten(input_shape=img_shape))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(256))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(1, activation='sigmoid'))
        model.summary()

        img = Input(shape=img_shape)
        validity = model(img)

        return Model(img, validity)

DCGAN - Deep Convolutional Generative Adversarial Network

Paper: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Code: carpedm20/DCGAN-tensorflow

Generator

DCGAN Tutorial

class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

    def forward(self, input):
        return self.main(input)

class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)

DCGAN_LSGAN_WGAN_WGAN-GP_SNGAN_RSGAN_BEGAN_ACGAN_PGGAN_pix2pix_BigGAN

Tutorial: CycleGAN

# Generator G translates X -> Y
# Generator F translates Y -> X.
fake_y = generator_g(real_x, training=True)
cycled_x = generator_f(fake_y, training=True)

fake_x = generator_f(real_y, training=True)
cycled_y = generator_g(fake_x, training=True)

# same_x and same_y are used for identity loss.
same_x = generator_f(real_x, training=True)
same_y = generator_g(real_y, training=True)

disc_real_x = discriminator_x(real_x, training=True)
disc_real_y = discriminator_y(real_y, training=True)

disc_fake_x = discriminator_x(fake_x, training=True)
disc_fake_y = discriminator_y(fake_y, training=True)

# calculate the loss
gen_g_loss = generator_loss(disc_fake_y)
gen_f_loss = generator_loss(disc_fake_x)

total_cycle_loss = calc_cycle_loss(real_x, cycled_x) + calc_cycle_loss(real_y,cycled_y)

# Total generator loss = adversarial loss + cycle loss
total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)

disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)
disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)

pix2pixHD

Paper: High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Code: NVIDIA/pix2pixHD

vid2vid

Paper: Video-to-Video Synthesis
Code: NVIDIA/vid2vid

Recycle-GAN

Paper: Recycle-GAN: Unsupervised Video Retargeting
Code: aayushbansal/Recycle-GAN

StarGAN

Paper: StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Code: yunjey/stargan

Glow

Blog: https://openai.com/blog/glow/
Paper: Glow: Generative Flow with Invertible 1x1 Convolutions
Code: openai/glow

StyleGAN

Paper: A Style-Based Generator Architecture for Generative Adversarial Networks
Code: NVlabs/stylegan

StyleGAN 2

Blog: Understanding the StyleGAN and StyleGAN2 Architecture
Paper: A Style-Based Generator Architecture for Generative Adversarial Networks
Code: hNVlabs/stylegan2-ada-pytorch

StyleGAN2-ADA

Paper: Training Generative Adversarial Networks with Limited Data
Code: hNVlabs/stylegan2-ada-pytorch

StyleGAN2 Distillation

Paper: StyleGAN2 Distillation for Feed-forward Image Manipulation
Code: EvgenyKashin/stylegan2-distillation

Toonify

Blog: StyleGAN network blending

Paper: Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains
Code: justinpinkney/toonify

Face Sketch Synthesis

Paper: Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network

pix2style2pix

Code: eladrich/pixel2style2pixel

Face Datasets

Celeb-A HQ Dataset

tensorflow datasets

Flickr-Faces-HQ Dataset (FFHQ)

MetFaces dataset

Animal Faces-HQ dataset (AFHQ)

Animal Faces-HQ (AFHQ), consisting of 15,000 high-quality images at 512×512 resolution
The dataset includes three domains of cat, dog, and wildlife, each providing about 5000 images.

Ukiyo-e Faces

Cartoon Faces

GANimation

Blog: GANimation: Anatomically-aware Facial Animation from a Single Image
Paper: GANimation: Anatomically-aware Facial Animation from a Single Image
Code: albertpumarola/GANimation

Sefa

Blog: https://genforce.github.io/sefa/
Paper: Closed-Form Factorization of Latent Semantics in GANs
Code: genforce/sefa

Pose	Mouth	Eye

AniGAN

Blog: 博士後小姐姐把「二次元老婆生成器」升級了：這一次可以指定畫風
Paper: AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

CartoonGAN

Blog: Generate Anime using CartoonGAN and Tensorflow 2.0
Paper: CartoonGAN
Code: mnicnc404/CartoonGan-tensorflow

Cartoon-GAN

Paper: Generative Adversarial Networks for photo to Hayao Miyazaki style cartoons
Code: FilipAndersson245/cartoon-gan

White-box Cartoonization

Paper: White-Box Cartoonization Using An Extended GAN Framework
Code: SystemErrorWang/White-box-Cartoonization

Code: White-box facial image cartoonizaiton

GAN Compression

Paper: GAN Compression: Efficient Architectures for Interactive Conditional GANs
Code: mit-han-lab/gan-compression

SRGAN

Paper: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Code: tensorlayer/srgan

SingleHDR

Paper: Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
Code: alex04072000/SingleHDR

Image Inpainting

High-Resolution Image Inpainting

Blog: Review: High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis
Paper: High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis
Code: leehomyc/Faster-High-Res-Neural-Inpainting

Image Inpainting for Irregular Holes

Blog: Image Inpainting for Irregular Holes Using Partial Convolutions
Paper: Image Inpainting for Irregular Holes Using Partial Convolutions
Code: NVIDIA/partialconv

DeepFill V2

Blog: A Practical Generative Deep Image Inpainting Approach
Paper: Free-Form Image Inpainting with Gated Convolution
Code: JiahuiYu/generative_inpainting

EdgeConnect

Paper: EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
Code: knazeri/edge-connect

Deep Flow-Guided Video Inpainting

Paper: Deep Flow-Guided Video Inpainting
Code: nbei/Deep-Flow-Guided-Video-Inpainting

Flow-edge Guided Video Completion

Paper: Flow-edge Guided Video Completion
Code: vt-vl-lab/FGVC

GauGAN

Paper: Semantic Image Synthesis with Spatially-Adaptive Normalization
Code: NVlabs/SPADE

DeepFaceDrawing

Paper: Deep Generation of Face Images from Sketches
Code: franknb/Drawing-to-Face

Pose-guided Person Image Generation

Paper: Pose Guided Person Image Generation
Code: charliememory/Pose-Guided-Person-Image-Generation

PoseGAN

Paper: Deformable GANs for Pose-based Human Image Generation
Code: AliaksandrSiarohin/pose-gan

Everybody Dance Now

Blog: https://carolineec.github.io/everybody_dance_now/
Paper: Everybody Dance Now
Code: carolineec/EverybodyDanceNow

Virtual Try On (VTON)

VITON

Paper: VITON: An Image-based Virtual Try-on Network

MG-VTON

Paper: Towards Multi-pose Guided Virtual Try-on Network

Poly-GAN

Paper: Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis

ACGPN

Paper: Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content
Code: switchablenorms/DeepFashion_Try_On

SMIS

Blog: https://seanseattle.github.io/SMIS/
Paper: Semantically Multi-modal Image Synthesis
Code: Seanseattle/SMIS

CP-VTON+

Paper: CP-VTON+: Clothing Shape and Texture Preserving Image-Based Virtual Try-On
Code: minar09/cp-vton-plus

O-VITON

Paper: Image Based Virtual Try-on Network from Unpaired Data
Code: trinanjan12/Image-Based-Virtual-Try-on-Network-from-Unpaired-Data

Inference Flow

Training Flow

PF-AFN

Paper: Parser-Free Virtual Try-on via Distilling Appearance Flows
Code: geyuying/PF-AFN

CIT

Paper: Cloth Interactive Transformer for Virtual Try-On
Code: Amazingren/CIT

pix2surf

Paper: Learning to Transfer Texture from Clothing Images to 3D Humans
Code: polobymulberry/pix2surf

TailorNet

Paper: TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style
Code: chaitanya100100/TailorNet

3D Avatar

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Blog: PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Paper: PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Code: facebookresearch/pifuhd

gDNA: Towards Generative Detailed Neural Avatars

Blog: PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Paper: gDNA: Towards Generative Detailed Neural Avatars

To generate diverse 3D humans, we build an implicit multi-subject articulated model. We model clothed human shapes and detailed surface normals in a pose-independent canonical space via a neural implicit surface representation, conditioned on latent codes.

Phorhum

Paper: Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing

NeRF

NeRF:Representing Scenes as Neural Radiance Fields for View Synthesis

Paper: arxiv.org/abs/2003.08934 Code: bmild/nerf Colab: tiny_nerf Kaggle: rkuo2000/tiny-nerf

The algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis.

NeRF Inference: Probabilistic Approach
Faster Inference: Efficient Ray-Tracing + Image Decomposition

Method Render	time	Speedup
NeRF	56185 ms	–
NeRF + ESS + ERT	788 ms	71
KiloNeRF	22 ms	2548

Sparse Voxel Grid and Octrees: Spatial Sparsity
Neural Sparse Voxel Fields proposed learn a sparse voxel grid in a progressive manner that increases the resolution of the voxel grid at a time to not just such represent explicit geomety but also to learn the implicit features per non-empty voxel.
PlenOctree also uses the octree structure to speed up the geometry queries and store the view-dependent color representation on the leaves.
KiloNeRF proposes to decompose this large deep NeRF into a set of small shallow NeRFs that capture only a small portion of the space.
Baking Neural Radiance Fields (SNeRG) proposes to decompose an image into the diffuse color and specularity so that the inference network handles a very simple task.

Point-NeRF

Paper: Point-NeRF: Point-based Neural Radiance Fields
Code: Xharlie/pointnerf
Point-NeRF uses neural 3D point clouds, with associated neural features, to model a radiance field.