Model Quantization, Inference Optimization, GGUF Format, Privacy-preserving AI
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference
arxiv.org·1d
Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
arxiv.org·9h
Certificates and Witnesses for Multi-objective {\omega}-regular Queries in Markov Decision Processes
arxiv.org·9h
Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization
arxiv.org·9h
Loading...Loading more...