💬 LLMs - simiasherextra · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🔌Embedded Systems Code

github.com··Hacker News, r/LLM

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

✨Neural Radiance Fields

everylocalai.com··DEV

Should LLM Agents Decide in Social Simulations? Comparing Finite-State and LLM-Based Decision Policies

🛡️AI Safety Academic

Orchestrate your LLM pipeline. Locally

🧠AI Research

llmforge.app··Hacker News

WWDC 2026: Foundation Models (& Anarlog)

🏳️‍🌈LGBT Tech

skushagra.com·

What Ollama Reveals About Local AI, Agents, and Open Models

🛡️AI Safety Blog

odsc.medium.com·

Improved performance and model support with GGUF

🔌Embedded Systems Blog

Intelligent inference scheduling with llm-d on Red Hat AI

🔌Embedded Systems

developers.redhat.com·

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

🔌Embedded Systems Blog

bric.pe.kr··DEV

Generative AI in the Real World: Agentic Systems Fundamentals with Maarten Grootendorst

🌐AGI Audio

Comprehensive evaluation of LLM capabilities for interpretation and analysis of genome-scale metabolic models in metabolic engineering

🌐AGI Academic

local llm on laptop 780M GPU using llama + gemma 4 qat

🔌Embedded Systems Blog

alper.bearblog.dev·

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

🔌Embedded Systems

har-ki.github.io··Hacker News

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

🔓Open Source

xda-developers.com·

Why LLMs (still) lack taste

🛡️AI Safety

beyondtheprior.com··Hacker News

Google's new open-weights model brings image-generation tricks to AI text generation

✨Neural Radiance Fields News

theregister.com·

How we fight GPU scarcity without compromise

🔌Embedded Systems Blog

equixly.com··Hacker News

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

✨Neural Radiance Fields Discussion

news.ycombinator.com··Hacker News

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🔌Embedded Systems Blog

adambien.blog·

Why Your LLM Gets Dumber With More Context

🛡️AI Safety

siliconopera.com·

Log in to enable infinite scrolling