🧠 Large Language Models (LLMs) - pleto · Scour

Markov Chains: The Grandparents of LLMs

✨Model optimizations in LLMs

dmanco.dev··Hacker News

Show HN: In-browser real LLM token counter and cost estimation

💬Prompt optimizations for LLM serving

holaclaw.ai··Hacker News

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

🤖Agents using LLMs Discussion

news.ycombinator.com··Hacker News

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

⚙️AI Infrastructure Automation Blog

fitservers.com·

Generative AI in the Real World: Agentic Systems Fundamentals with Maarten Grootendorst

🔍Retrieval-augmented generation Audio

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

🔍Retrieval-augmented generation

venturebeat.com·

Google open-sources speedy DiffusionGemma text diffusion model

🔍Retrieval-augmented generation

siliconangle.com·

LLM are universal simulators

✨Model optimizations in LLMs

invertedpassion.com··Hacker News

Google's new open-weights model brings image-generation tricks to AI text generation

📊AI Performance Profiling News

theregister.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🔢Quantization of LLMs Blog

adambien.blog·

local llm on laptop 780M GPU using llama + gemma 4 qat

🔢Quantization of LLMs Blog

alper.bearblog.dev·

New comment by alroma90 in "Ask HN: Who wants to be hired? (June 2026)"

🤖Agents using LLMs Discussion

news.ycombinator.com··Hacker News

Don't let the LLM speak, just probe it (8 minute read)

💬Prompt optimizations for LLM serving Blog

blog.j11y.io··Hacker News

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

💬Prompt optimizations for LLM serving Blog

AI context windows: Why context quality beats context size

🔍Retrieval-augmented generation Blog

[NEW MODEL] SupraLabs just released Supra1.5-50M Base (Experimental)!

🔧Systems-level optimizations for LLM serving

huggingface.co··r/LocalLLaMA

langchain-ai/langchain langchain-core==1.4.6

🔍Retrieval-augmented generation Code

·

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🔧Systems-level optimizations for LLM serving Blog

cloud.google.com··Hacker News

Tokenminning: Because Tokenmaxxing Is a Bad Idea

💬Prompt optimizations for LLM serving

tokenminning.com··Hacker News

AI The Truly Environmentally Friendly Way

⚙️AI Infrastructure Automation

Log in to enable infinite scrolling