🏠 Local LLM Deployment - CWhiting · Scour

Unraveling GPU Inference Costs for Fine-tuned Open-source Models V/S Closed Platforms 💰Inference Cost

mlops.community·1d

Small Model Forensics ⚡LLM Optimization

blog.0xmmo.co·1h·Hacker News

https://www.together.ai/blog/accelerate-inference-large-scale-workloads ⚡LLM Optimization

together.ai·23h

Tracing tokens through Llama 3.1 8B inference on H100s 🤖LLM

krithik.xyz·5d·Hacker News

Gemma 4: The Next Frontier in Open-Source AI for Developers 🤖GenAI

dev.to·6h·DEV

Show HN: Sipsa Inference – lossless serving at 50% off ⚡LLM Optimization

sipsalabs.com·2d·Hacker News

Understanding KV Cache in LLMs and How It Affects Inference ⚡LLM Optimization

pub.towardsai.net

·5d

Building Blocks for Foundation Model Training and Inference on AWS 🚀Model Releases

huggingface.co·2d

What Inference-Platform Benchmark Posts Leave Out 📊AI Performance Profiling

dev.to·19h·DEV

In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference ⚡LLM Optimization

pytorch.org·1d·Hacker News

Tiny company steals AMD's thunder and challenges Nvidia with old-tech PCIe AI accelerator that runs 700B LLMs locally, sipping just 240W thanks to decade-old DD... 📊AI Performance Profiling

·3d·Hacker News

Company behind GLiNER model released open source model for running LLM guardrail 🤖LLM

pioneer.ai·1d·Hacker News

Your GPU Is Lying to You About Its Capacity 🤖AI News

hackernoon.com·3d

The Inference Shift 💰Inference Cost

stratechery.com·2d·Hacker News

Local LLMs Vs Cloud AI APIs: Which One Should Developers Use For Real Projects? 🏆LLM Benchmarking

dev.to·2d·DEV

https://www.together.ai/blog/flexgen-high-throughput-generative-inference-of-large-language-models-with-a-single-gpu 🤖GenAI

together.ai·23h

Building a Fully Offline AI Coding Assistant with Gemma 4 💻Codex

dev.to·6d·DEV

OpenModels: Explore LLM Models and Inference Providers 🔌MCP

dev.to·2d·DEV

Physics‑based adaptation slashes edge LLM energy ⚡LLM Optimization

dev.to·6d·DEV

Exploring LLMs Speed Benchmarks ⚡LLM Optimization

mlops.community·1d

Log in to enable infinite scrolling