Cluster-driven Expert Pruning (C-Prune) is a novel framework designed to enhance the efficiency of Mixture-of-Experts (MoE) large language models by addressing issues of expert redundancy within and across layers. By implementing layer-wise expert clustering followed by global cluster pruning, C-Prune effectively reduces model size and improves performance compared to existing pruning methods. Extensive experiments validate its effectiveness on various MoE models and benchmarks.
C3PO introduces a novel approach for optimizing expert pathways in Mixture-of-Experts (MoE) Large Language Models at test time, significantly improving accuracy by 7-15% through collaborative re-weighting of core experts in critical layers. By utilizing surrogate objectives based on successful neighboring samples, C3PO enhances efficiency, enabling models with fewer parameters to outperform larger counterparts. The method demonstrates superior performance over existing test-time learning techniques across various benchmarks.