Cheatsheet of Latex Code for Transfer Learning Equations

rockingdingo 2024-08-25 23:05 #machine learning #transfer learning #domain adaptation #Domain-Adversarial Neural Networks

Cheatsheet of Latex Code for Most Popular Transfer Learning Equations

Navigation

In this blog, we will summarize the latex code of most fundamental equations of transfer learning(TL). Different from multi-task learning, transfer learning models aims to achieve the best performance on target domain (minimized target domain test errors), not the performance of source domain. Typical transfer learning methods including domain adaptation(DA), feature sub-space alignment, etc. In this post, we will dicuss more details of TL equations, including many sub-areas like domain adaptation, H-divergence, Domain-Adversarial Neural Networks(DANN), which are useful as quick reference for your research.

1. Domain Adaptation

1.1 H-Divergence

1.2 Bound on Target Domain Error

1.3 Domain-Adversarial Neural Networks(DANN)

1. Domain Adaptation

1.1 H-Divergence

Equation

$d_{\mathcal{H}}(\mathcal{D},\mathcal{D}^{'})=2\sup_{h \in \mathcal{H}}|\Pr_{\mathcal{D}}[I(h)]-\Pr_{\mathcal{D}^{'}}[I(h)]|$

Latex Code

d_{\mathcal{H}}(\mathcal{D},\mathcal{D}^{'})=2\sup_{h \in \mathcal{H}}|\Pr_{\mathcal{D}}[I(h)]-\Pr_{\mathcal{D}^{'}}[I(h)]|

Explanation

The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.
1.2 Bound on Target Domain Error

Equation

$\epsilon_{T}(h) \le \hat{\epsilon}_{S}(h) + \sqrt{\frac{4}{m}(d \log \frac{2em}{d} + \log \frac{4}{\delta })} + d_{\mathcal{H}}(\tilde{\mathcal{D}}_{S}, \tilde{\mathcal{D}}_{T}) + \lambda$

Latex Code
```
            \epsilon_{T}(h) \le \hat{\epsilon}_{S}(h) + \sqrt{\frac{4}{m}(d \log \frac{2em}{d} + \log \frac{4}{\delta })} + d_{\mathcal{H}}(\tilde{\mathcal{D}}_{S}, \tilde{\mathcal{D}}_{T}) + \lambda \\
            \lambda = \lambda_{S} + \lambda_{T}
        
```
Explanation

I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \hat{\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \lambda denotes a fixed term as the sum of \lambda_{S} and \lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. You can check more detailed information in this NIPS 2006 paper by Shai Ben-David, Analysis of Representations for Domain Adaptation for more details.
1.3 Domain-Adversarial Neural Networks(DANN)

Equation

$\min [\frac{1}{m}\sum^{m}_{1}\mathcal{L}(f(\textbf{x}^{s}_{i}),y_{i})+\lambda \max(-\frac{1}{m}\sum^{m}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{s}_{i}),1)-\frac{1}{m^{'}}\sum^{m^{'}}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{t}_{i}),0))]$

Latex Code
```
            \min [\frac{1}{m}\sum^{m}_{1}\mathcal{L}(f(\textbf{x}^{s}_{i}),y_{i})+\lambda \max(-\frac{1}{m}\sum^{m}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{s}_{i}),1)-\frac{1}{m^{'}}\sum^{m^{'}}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{t}_{i}),0))]
        
```
Explanation

In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.

1.1 H-Divergence

Equation

Latex Code

Explanation

1.2 Bound on Target Domain Error

Equation

Latex Code

Explanation

1.3 Domain-Adversarial Neural Networks(DANN)

Equation

Latex Code

Explanation