Mixture of Experts
ConceptNeural network architecture that activates only a subset of parameters per input, reducing compute requirements
1 story
Neural network architecture that activates only a subset of parameters per input, reducing compute requirements
1 story