Cluster-driven Expert Pruning (C-Prune) is a novel framework designed to enhance the efficiency of Mixture-of-Experts (MoE) large language models by addressing issues of expert redundancy within and across layers. By implementing layer-wise expert clustering followed by global cluster pruning, C-Prune effectively reduces model size and improves performance compared to existing pruning methods. Extensive experiments validate its effectiveness on various MoE models and benchmarks.
+ mixture-of-experts
model-pruning ✓
large-language-models ✓
expert-clustering ✓
computational-efficiency ✓