Cheatsheet of Latex Code for Most Popular Machine Learning Equations

rockingdingo 2024-08-25 23:05 #GAN #VAE #KL-Divergence #Wasserstein #Mahalanobis

Navigation

In this blog, we will summarize the latex code for most popular machine learning equations, including multiple distance measures, generative models, etc. There are various distance measurements of different data distribution, including KL-Divergence, JS-Divergence, Wasserstein Distance(Optimal Transport), Maximum Mean Discrepancy(MMD) and so on. We will provide the latex code for machine learning models in the following sections. We will also provide latex code of Generative Adversarial Networks(GAN), Variational AutoEncoder(VAE), Diffusion Models(DDPM) for generative models in the second section.

1. Distance Measure

1.1 Kullback-Leibler Divergence(KL-Divergence)

1.2 Jensen-Shannon Divergence(JS-Divergence)

1.3 Wasserstein Distance(Optimal Transport)

1.4 Maximum Mean Discrepancy(MMD)

1.5 Mahalanobis Distance

2. Generative Models

2.1 Generative Adversarial Networks(GAN)

2.2 Variational AutoEncoder(VAE)

2.3 Diffusion Models(DDPM)

Distance Measure

Kullback-Leibler Divergence(KL-Divergence)

Equation

$$KL(P||Q)=\sum_{x}P(x)\log(\frac{P(x)}{Q(x)})$$

Latex Code

KL(P||Q)=\sum_{x}P(x)\log(\frac{P(x)}{Q(x)})

Explanation

Jensen-Shannon Divergence(JS-Divergence)

Equation

$JS(P||Q)=\frac{1}{2}KL(P||\frac{(P+Q)}{2})+\frac{1}{2}KL(Q||\frac{(P+Q)}{2})$

Latex Code

        JS(P||Q)=\frac{1}{2}KL(P||\frac{(P+Q)}{2})+\frac{1}{2}KL(Q||\frac{(P+Q)}{2})

Explanation

Wasserstein Distance(Optimal Transport)

Equation

$W_{p}(P,Q)=(\inf_{J \in J(P,Q)} \int{||x-y||^{p}dJ(X,Y)})^\frac{1}{p}$

Latex Code

        W_{p}(P,Q)=(\inf_{J \in J(P,Q)} \int{||x-y||^{p}dJ(X,Y)})^\frac{1}{p}

Explanation

Maximum Mean Discrepancy(MMD)

Equation

$\textup{MMD}(\mathbb{F},X,Y):=\sup_{f \in \mathbb{F}}(\frac{1}{m}\sum_{i=1}^{m}f(x_{i}) - \frac{1}{n}\sum_{j=1}^{n}f(y_{j}))$

Latex Code

  
        \textup{MMD}(\mathbb{F},X,Y):=\sup_{f \in \mathbb{F}}(\frac{1}{m}\sum_{i=1}^{m}f(x_{i}) - \frac{1}{n}\sum_{j=1}^{n}f(y_{j}))

Explanation

Mahalanobis Distance

Equation

$D_{M}(x,y)=\sqrt{(x-y)^{T}\Sigma^{-1}(x-y)}$

Latex Code
```
  
        D_{M}(x,y)=\sqrt{(x-y)^{T}\Sigma^{-1}(x-y)}
        
```
Explanation

Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details https://www.sciencedirect.com/topics/engineering/mahalanobis-distance.

Generative Models

Generative Adversarial Networks(GAN)

Equation

$\min_{G} \max_{D} V(D,G)=\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]+\mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]$

Latex Code
```
        \min_{G} \max_{D} V(D,G)=\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]+\mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]
        
```
Explanation

GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks

Variational AutoEncoder(VAE)

Estimating the Log-likelihood and Posterior

Equation

$\log p_{\theta}(x)=\mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x)] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log \frac{p_{\theta}(x,z)}{p_{\theta}(z|x)}] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} \times \frac{q_{\phi}(z|x)}{p_{\theta}(z|x)}]] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} ]] +D_{KL}(q_{\phi}(z|x) || p_{\theta}(z|x))$

Latex Code

        \log p_{\theta}(x)=\mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x)] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log \frac{p_{\theta}(x,z)}{p_{\theta}(z|x)}] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} \times \frac{q_{\phi}(z|x)}{p_{\theta}(z|x)}]] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} ]] +D_{KL}(q_{\phi}(z|x) || p_{\theta}(z|x))\\

Explanation

Evidence Lower Bound

Equation

$\mathbb{L}_{\theta,\phi}(\mathbf{x})=\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}) ]$

Latex Code

            \mathbb{L}_{\theta,\phi}(\mathbf{x})=\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}) ]

Explanation

Reparameterization trick

Equation

$z = \mu + \epsilon \cdot \sigma$

Latex Code

            z = \mu + \epsilon \cdot \sigma

Explanation

VAE latex code is illustrated above. See paper for more details Auto-Encoding Variational Bayes

Diffusion Models(DDPM)

Explanation

See paper Denoising Diffusion Probabilistic Models for more details. See reference of the following blogpost https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

1.1 Forward Process

Equation

$q(x_{t}|x_{t-1})=\mathcal{N}(x_{t};\sqrt{1-\beta_{t}}x_{t-1},\beta_{t}I) \\q(x_{1:T}|x_{0})=\prod_{t=1}^{T}q(x_{t}|x_{t-1})$

Latex Code

            q(x_{t}|x_{t-1})=\mathcal{N}(x_{t};\sqrt{1-\beta_{t}}x_{t-1},\beta_{t}I) \\q(x_{1:T}|x_{0})=\prod_{t=1}^{T}q(x_{t}|x_{t-1})

1.2 Forward Process Reparameterization Trick

Equation

$x_{t}=\sqrt{\alpha_{t}}x_{t-1}+\sqrt{1-\alpha_{t}} \epsilon_{t-1}\\=\sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon}_{t-2}\\=\text{...}\\=\sqrt{\bar{\alpha}_{t}}x_{0}+\sqrt{1-\bar{\alpha}_{t}}\epsilon \\\alpha_{t}=1-\beta_{t}, \bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}$

Latex Code

            x_{t}=\sqrt{\alpha_{t}}x_{t-1}+\sqrt{1-\alpha_{t}} \epsilon_{t-1}\\=\sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon}_{t-2}\\=\text{...}\\=\sqrt{\bar{\alpha}_{t}}x_{0}+\sqrt{1-\bar{\alpha}_{t}}\epsilon \\\alpha_{t}=1-\beta_{t}, \bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}

1.3 Reverse Process

$p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod^T_{t=1} p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) \\ p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t))$

Latex Code

            p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod^T_{t=1} p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) \\
            p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t))

1.4 Reverse Process Variational Lower Bound

$\begin{aligned} - \log p_\theta(\mathbf{x}_0) &\leq - \log p_\theta(\mathbf{x}_0) + D_\text{KL}(q(\mathbf{x}_{1:T}\vert\mathbf{x}_0) \| p_\theta(\mathbf{x}_{1:T}\vert\mathbf{x}_0) ) \\ &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_{\mathbf{x}_{1:T}\sim q(\mathbf{x}_{1:T} \vert \mathbf{x}_0)} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T}) / p_\theta(\mathbf{x}_0)} \Big] \\ &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_q \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} + \log p_\theta(\mathbf{x}_0) \Big] \\ &= \mathbb{E}_q \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\ \text{Let }L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \geq - \mathbb{E}_{q(\mathbf{x}_0)} \log p_\theta(\mathbf{x}_0) \end{aligned}$

Latex Code

            \begin{aligned}
            - \log p_\theta(\mathbf{x}_0) 
            &\leq - \log p_\theta(\mathbf{x}_0) + D_\text{KL}(q(\mathbf{x}_{1:T}\vert\mathbf{x}_0) \| p_\theta(\mathbf{x}_{1:T}\vert\mathbf{x}_0) ) \\
            &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_{\mathbf{x}_{1:T}\sim q(\mathbf{x}_{1:T} \vert \mathbf{x}_0)} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T}) / p_\theta(\mathbf{x}_0)} \Big] \\
            &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_q \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} + \log p_\theta(\mathbf{x}_0) \Big] \\
            &= \mathbb{E}_q \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\
            \text{Let }L_\text{VLB} 
            &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \geq - \mathbb{E}_{q(\mathbf{x}_0)} \log p_\theta(\mathbf{x}_0)
            \end{aligned}

1.5 Reverse Process Variational Lower Bound Decomposition Multiple KL-Divergence

$$begin{aligned}L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\&= \mathbb{E}_q \Big[ \log\frac{\prod_{t=1}^T q(\mathbf{x}_t\vert\mathbf{x}_{t-1})}{ p_\theta(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t) } \Big] \\&= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ]\end{aligned}$$

Latex Code

            \begin{aligned}L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\&= \mathbb{E}_q \Big[ \log\frac{\prod_{t=1}^T q(\mathbf{x}_t\vert\mathbf{x}_{t-1})}{ p_\theta(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t) } \Big] \\&= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ]\end{aligned}

1.6 Reverse Process Variational Lower Bound Loss Function

$\begin{aligned} L_\text{VLB} &= L_T + L_{T-1} + \dots + L_0 \\ \text{where } L_T &= D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T)) \\ L_t &= D_\text{KL}(q(\mathbf{x}_t \vert \mathbf{x}_{t+1}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_t \vert\mathbf{x}_{t+1})) \text{ for }1 \leq t \leq T-1 \\ L_0 &= - \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1) \end{aligned}$

Latex Code

            \begin{aligned}
            L_\text{VLB} &= L_T + L_{T-1} + \dots + L_0 \\
            \text{where } L_T &= D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T)) \\
            L_t &= D_\text{KL}(q(\mathbf{x}_t \vert \mathbf{x}_{t+1}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_t \vert\mathbf{x}_{t+1})) \text{ for }1 \leq t \leq T-1 \\
            L_0 &= - \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)
            \end{aligned}

Comments

V

Victorynz 2025-06-30 11:32

Can I contact Administration? It is about advertisement on your website. Thank.

20

23

Follow

Reply

AI Hub Admin

replies to

V

Victorynz

2025-09-22 14:46

Hi Victorynz, deepnlp.org is about sharing useful AI and Robotics technical materials. And if you think it's appropriate you can send me short message but we usually don't sell ads under the blogs. Regards.

Reply

S

Silke

replies to

V

Victorynz

2025-09-28 22:27

I appreciate browsing your websites. Appreciate it! my page: web page (https://gamehaul.ru/)

Reply

E

Esther

replies to

V

Victorynz

2025-10-04 08:16

Les couleurs et les designs ont d'abord Ã©tÃ© inspirÃ©s par les mouvements pop art et art moderne, alias la culture "Mod". My web page adult xxx video sexual porn big ass (https://have2have.it/)

Reply

A

Arnold

replies to

V

Victorynz

2025-12-05 04:10

Cette robe de soirÃ©e style annÃ©es 60 est parfaite pour un look rÃ©tro. my blog post ... Ð²Ð¸Ð´ÐµÐ¾ Ð´Ð»Ñ Ð²Ð·ÑÐ¾ÑÐ»ÑÑ xxx - Forrest (https://womenstory.top/),

Reply

H

https://ihz88.com/

replies to

V

Victorynz

2025-12-13 22:06

Hi there, I enjoy reading through your article. I wanted to write a little comment to support you.

Reply

H

https://jorhatmedicalcollege.in/

replies to

V

Victorynz

2026-01-20 20:01

Hi there every one, here every one is sharing such knowledge, thus it's nice to read this weblog, and I used to go to see this web site daily.

Reply

H

https://presswhizz.com/

replies to

V

Victorynz

2026-02-14 22:18

Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to more added agreeable from you! By the way, how can we communicate?

Reply

H

https://mhttcnetwork.org/

replies to

V

Victorynz

2026-02-18 01:48

Hey there! I'm at work surfing around your blog from my new apple iphone! Just wanted to say I love reading your blog and look forward to all your posts! Keep up the superb work!

Reply

H

https://www.mansionbet.com/vn/fun88/

replies to

V

Victorynz

2026-03-08 18:51

I'm amazed, I must say. Rarely do I come across a blog that's both educative and entertaining, and without a doubt, you've hit the nail on the head. The issue is something which too few people are speaking intelligently about. Now i'm very happy that I came across this in my search for something relating to this.

Reply
V

Victorrns 2025-07-01 01:36

Where is admin? It is important. Regards.

20

20

Follow

Reply
V

Victorvnq 2025-07-01 07:03

Can I contact admin?? I'ts important. Regards.

20

21

Follow

Reply
V

Victorvnq 2025-06-30 05:00

Where is admin? It is about advertisement on your website. Regards.

19

20

Follow

Reply
V

Victordwc 2025-07-01 01:38

Where is admin? It is important. Thank.

19

19

Follow

Reply

Write Your Comment

Chatbot close

Bot
Hi TEMP_4cc0efcd,
How can I help you today?

Send

Navigation

Distance Measure

Kullback-Leibler Divergence(KL-Divergence)

Jensen-Shannon Divergence(JS-Divergence)

Wasserstein Distance(Optimal Transport)

Maximum Mean Discrepancy(MMD)

Mahalanobis Distance

Generative Models

Generative Adversarial Networks(GAN)

Variational AutoEncoder(VAE)

Estimating the Log-likelihood and Posterior

Evidence Lower Bound

Reparameterization trick

Diffusion Models(DDPM)

1.1 Forward Process

1.2 Forward Process Reparameterization Trick

1.3 Reverse Process

1.4 Reverse Process Variational Lower Bound

1.5 Reverse Process Variational Lower Bound Decomposition Multiple KL-Divergence

1.6 Reverse Process Variational Lower Bound Loss Function

Comments

Write Your Comment