🧠 Large Language Models (LLMs) - pleto · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🔧Systems-level optimizations for LLM serving Code

github.com··Hacker News, r/LLM

How LLMs are Actually Trained

✨Model optimizations in LLMs News Blog

blog.algomaster.io·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

✨Model optimizations in LLMs Academic

Making a Vintage LLM from Scratch

💬Prompt optimizations for LLM serving

crlf.link··Hacker News

LangChain vs LlamaIndex 2026: Response Time on 10 RAG Tasks

🔍Retrieval-augmented generation Blog Discussion

How ChatGPT Actually Works (Beginner Friendly)

🤖Agents using LLMs Blog

·

LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents

🔍Retrieval-augmented generation Blog

towardsai.net·

Orchestrate your LLM pipeline. Locally

✨Model optimizations in LLMs

llmforge.app··Hacker News

Context windows in AI: why every token is a budget decision

🔍Retrieval-augmented generation Blog

Why Your LLM Gets Dumber With More Context

🔍Retrieval-augmented generation

siliconopera.com·

Philosophy

🔍Retrieval-augmented generation Reference

docs.langchain.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🔢Quantization of LLMs Blog

adambien.blog·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

✨Model optimizations in LLMs

xda-developers.com·

LLM Routing: From Strategy Selection to Production Architecture

📊AI Performance Profiling Blog

DiffusionGemma: Discrete diffusion in a large language model

🔧Systems-level optimizations for LLM serving

idlemachines.co.uk··Hacker News

Research Proposal: Decoupled RISC-LLM Architectures via Circadian Synaptic Consolidation

🔍Retrieval-augmented generation

aermia.com··Hacker News

Prompt Caching Explained: The AI Concept That Can Save Millions of Tokens

🔍Retrieval-augmented generation Blog

sweta-nit.medium.com·

My Notes on the Progression from Context to Prompt to Harness engineering in making GPT LLMs Useful: (TUESDAY) MAMLMs

🔍Retrieval-augmented generation News Blog

braddelong.substack.com

LLM Cheat Sheet

🔍Retrieval-augmented generation Blog

drkpxl.bearblog.dev·

Why LLMs (still) lack taste

💬Prompt optimizations for LLM serving

beyondtheprior.com··Hacker News

Log in to enable infinite scrolling