🤖 AI Inference - nmarshall · Scour

Technology solutions targeting the performance of gen-AI inference in resource constrained platforms 🏗️AI Infrastructure

arxiv.org·1d

Introducing dotLLM - Building an LLM Inference Engine in C# 🏗️AI Infrastructure

kokosa.dev·14h·Hacker News

Quantization, LoRA, and the 8% Problem: Benchmarking Local LLMs for Production AI 📱Edge AI

walsenburgtech.com·3d·Hacker News

Turning idle household RTX 3090s into a batch AI inference network: looking for testers 🏗️AI Infrastructure

solvyr.com·2d·r/selfhosted

Redefining AI Inference With New Silicon Architecture ⚡Hardware Acceleration

semiengineering.com·5d

LLM inference engine written ground-up natively in C#/.NET 🏗️AI Infrastructure

dotllm.dev·13h·Hacker News

Google Enhances AI Inference Control for Enterprises 🏗️AI Infrastructure

pub.towardsai.net

·6d

Compare TEE-Based AI Providers 🏗️AI Infrastructure

confidentialinference.net·6d·Hacker News

The Engine Behind Modern LLM Inference, Part 1: Continuous Batching, PagedAttention, and the End of… 🏗️AI Infrastructure

medium.com·5d

Reasoning as Data: Representation-Computation Unity and Its Implementation in a Domain-Algebraic Inference Engine ⚙️Alloy

arxiv.org·1d

Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees 💻Local LLMs

arxiv.org·1d

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows 🏗️AI Infrastructure

arxiv.org·1d

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference 🏗️AI Infrastructure

arxiv.org·1d

Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC 🏗️AI Infrastructure

arxiv.org·5d

Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale 🏗️AI Infrastructure

arxiv.org·6d

QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch 📱Edge AI

arxiv.org·5d

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization 📱Edge AI

arxiv.org·6d

Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin 👥Digital Twins

arxiv.org·5d

From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference ⚡Hardware Acceleration

arxiv.org·5d

Neural Computers 🧠Neuromorphic Hardware

arxiv.org·6d·Hacker News, Hacker News