🤖 LLM Inference - codenm.no2 · Scour

MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference 🧠LLM

arxiv.org·3d·

alexziskind1/llm-inference-calculator 🧠LLM

github.com·1d·

What is inference engineering? Deepdive ✍️Prompt Engineering

newsletter.pragmaticengineer.com·2d·

What if AI doesn’t need more RAM but better math? 💬LLMs

adlrocha.substack.com·4d·Substack·

Fast and Accurate Probing of In-Training LLMs' Downstream Performances 🧠LLM

arxiv.org·19h·

Pure C implementation of the TurboQuant paper (ICLR 2026) for KV cache compression in LLM inference. ⚡Assembly Language

github.com·1d·r/LocalLLaMA·

Executing as You Generate: Hiding Execution Latency in LLM Code Generation ⚙️Compilers

arxiv.org·19h·

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs 💬LLMs

arxiv.org·19h·

SharpAI/SwiftLM: ⚡ Native Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app. 💻KVM

github.com·1d·Hacker News, Hacker News·

m0at/rvllm: rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement. 🧠LLM

github.com·5d·Hacker News·

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference 🧠LLM

arxiv.org·1d·

ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference 🧠LLM

arxiv.org·2d·

TAMI-MPC:Trusted Acceleration of Minimal-Interaction MPC for Efficient Nonlinear Inference ✍️Prompt Engineering

arxiv.org·6d·

Efficient Inference of Large Vision Language Models 👁️Multimodal AI

arxiv.org·2d·

Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference 💬LLMs

arxiv.org·3d·

ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing 🧠LLM

arxiv.org·2d·

Compiling Code LLMs into Lightweight Executables ⚙️Compilers

arxiv.org·1d·

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3 ⚙️Program Synthesis

arxiv.org·2d·

Multiple-Prediction-Powered Inference 🎲Bayesian Inference

arxiv.org·2d·

M-MiniGPT4: Multilingual VLLM Alignment via Translated Data 🧠LLM

arxiv.org·1d·

Loading more...