Model Serving, GPU Clusters, Inference Optimization, MLOps
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
arxiv.org·1d
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
arxiv.org·1d
Loading...Loading more...