Mixture of Experts

Concept

Neural network architecture that activates only a subset of parameters per input, reducing compute requirements

1 story