MultiGate Mixture of Experts MMoE
Tags: #machine learning #multi taskEquation
$$g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))$$Latex Code
g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))
Have Fun
Let's Vote for the Most Difficult Equation!
Introduction
Equation
Latex Code
g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))
Explanation
MultiGate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multitask Learning with Multigate MixtureofExperts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a Ndimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the ith expert, and f^{k}(x) indicates the representation of kth tasks as the summation of N experts.
Related Documents
 See paper Modeling Task Relationships in Multitask Learning with Multigate MixtureofExperts for details.
Related Videos
Discussion
Comment to Make Wishes Come True
Leave your wishes (e.g. Passing Exams) in the comments and earn as many upvotes as possible to make your wishes come true

Chris BurtonDoing everything I can to ensure I pass this exam.Joel Harvey reply to Chris BurtonNice~20230311 00:00 
Clarence BriggsCrossing my fingers to pass this test.Douglas Richardson reply to Clarence BriggsNice~20230605 00:00 
Vance ShepardStriving to pass this upcoming test.Alexander Rogers reply to Vance ShepardNice~20240503 00:00
Reply