🔮 Speculative Decoding - teslartifex · Scour

Skill evals: what's changed since promptfoo ✍️Prompt Engineering

Draft-OPD: On-Policy Distillation for Speculative Draft Models 🤖LLM Inference

Hackers Find That Inaudible Sounds Hidden in Podcasts or Random Videos Can Hijack Your AI Voice Chatbot 📡Edge AI

futurism.com·5d

PhaseTransfer: A transfer learning framework for efficient phase diagram mapping 🤖LLM Inference

Misc. bug: Meta backend `ggml_context` pool exhaustion with `--split-mode tensor` · Issue #22404 💾NVMe

github.com·4d·r/LocalLLaMA

Hidden audio commands expose a new weak point in voice AI 📡Edge AI

startupfortune.com·5d

RTP-LLM: High-Performance Alibaba LLM Inference Engine 🤖LLM Inference

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding 🤖LLM Inference

arxiv.org·2d·Hacker News

LongLive 2.0: NVFP4 is intended to make long AI videos faster and more memory-efficient 🤖LLM Inference

igorslab.de·3d

Micro-Expert-Router: Running Mixtral-Class Moe Models on NVMe SSDs Without a GPU 🤖LLM Inference

github.com·2d·DEV, Hacker News

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs 🧠LLMs

Knowledge: You can just build your own AI feed to keep up, without the noise 📡Edge AI

github.com·6d·Hacker News

Auditing Training Data in Generative Music Models via Black-Box Membership Inference 🤖LLM Inference

Beyond the Target: From Imitation to Collaboration in Speculative Decoding 🤖LLM Inference

Learning to Adapt SFT Data for Better Reasoning Generalization 🤖LLM Inference

microsoft/SkillOpt: SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts. ⚙️Automation

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution 📡Edge AI

ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU 🤖LLM Inference

Once-For-All: A Train-Once and Select-Anytime Framework for Multimodal Instruction Tuning ⚡Flash Attention

Building Reliable AI Coding Workflows Using Modular AI Agent Optimization 🎯AI Agents

techcommunity.microsoft.com·22h

Log in to enable infinite scrolling