Large Language Models (LLMs)

Feeds to Scour
SubscribedAll
Scoured 692 posts in 7.5 ms

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🔧Systems-level optimizations for LLM serving  Content type: Code
github.com··Hacker News, r/LLM

How LLMs are Actually Trained

 Model optimizations in LLMs  Content type: News  Content type: Blog
blog.algomaster.io·

Making a Vintage LLM from Scratch

 💬Prompt optimizations for LLM serving
crlf.link··Hacker News

Orchestrate your LLM pipeline. Locally

 Model optimizations in LLMs
llmforge.app··Hacker News

Should LLM Agents Decide in Social Simulations? Comparing Finite-State and LLM-Based Decision Policies

 🤖Agents using LLMs  Content type: Academic
arxiv.org·

How ChatGPT Actually Works (Beginner Friendly)

 🤖Agents using LLMs  Content type: Blog
medium.com
·

LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents

 🔍Retrieval-augmented generation  Content type: Blog
towardsai.net·

Why Your LLM Gets Dumber With More Context

 🔍Retrieval-augmented generation
siliconopera.com·

LangChain vs LlamaIndex 2026: Response Time on 10 RAG Tasks

 🔍Retrieval-augmented generation  Content type: Blog  Content type: Discussion
tildalice.io·

Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon

 Model optimizations in LLMs
xda-developers.com·

Context windows in AI: why every token is a budget decision

 🔍Retrieval-augmented generation  Content type: Blog
redis.io·

Philosophy

 🔍Retrieval-augmented generation  Content type: Reference
docs.langchain.com·

Prompt Caching Explained: The AI Concept That Can Save Millions of Tokens

 🔍Retrieval-augmented generation  Content type: Blog
sweta-nit.medium.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

 🔢Quantization of LLMs  Content type: Blog
adambien.blog·

Research Proposal: Decoupled RISC-LLM Architectures via Circadian Synaptic Consolidation

 🔍Retrieval-augmented generation
aermia.com··Hacker News

LLM Cheat Sheet

 🔍Retrieval-augmented generation  Content type: Blog
drkpxl.bearblog.dev·

LLM Routing: From Strategy Selection to Production Architecture

 📊AI Performance Profiling  Content type: Blog
blog.n8n.io·

Show HN: In-browser real LLM token counter and cost estimation

 💬Prompt optimizations for LLM serving
holaclaw.ai··Hacker News

My Notes on the Progression from Context to Prompt to Harness engineering in making GPT LLMs Useful: (TUESDAY) MAMLMs

 🔍Retrieval-augmented generation  Content type: News  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help