AIGC model

Modeling techniques for AIGC

Taking a face generation as an example, we will introduce representative generative models and highlight their differences. The generation process aims to estimate the probability distribution of the face data \(p(x)\). With bayes rule, we can estimate the probability of the data given the model \(p(x|z)\) by marginalizing over the latent variable \(z\):

\[p(x) = \int p(x|z)p(z) dz\]

Autoencoder (AE) models \(p(x)\) where \(x\) is a face. We cannot control the generated face.

Variational Autoencoder (VAE) models \(p(x|z)\) where \(z\) is a latent continuous variable (e.g., expression).

Vector Quantized Variational Autoencoder (VQ-VAE) models \(p(x|z)\) where \(z\) is a discrete latent variable (e.g., gender).

Autoregressive models a joint distribution \(p(x_1, x_2, ..., x_T)\) where \(x_1, x_2, ..., x_T\) are the pixels of the face.

Generative Adversarial Networks (GANs) employs a discriminator \(D(x)\) to distinguish the real data \(x\) from the generated data \(G(z)\). The generator \(G(z)\) tries to fool the discriminator.

Diffusion Models WIP

Neural Radience Fields (NeRF) WIP

3D Gaussian Splatting (3DGS) WIP

Generative Video Models WIP

Foundation model development

Developing a large-scale foundation model is a challenging task due to the high computational cost and the need for large-scale data. In this section, we will introduce the key components of a foundation model and accelerate the development process with NeMo.

NeMo is a scalable and high-performant generative AI framwork developed by NVIDIA that provides a set of tools for building large-scale foundation models (LLM, MLLM, and TTS).

https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/nemo-llm-mm-stack.png

NeMo Overview

As shown in the figure, the lifecycle of a foundation model development includes the following steps:

Tutorial notebooks are listed here.

References

  1. Kaiming He. “6.S978 Deep Generative Models (MIT EECS, 2024 Fall).