Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLMs
💬 LLMs
Specific
large language models, GPT, transformers, prompt engineering
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
702
posts in
21.3
ms
A handy
llama-server
launcher with easy
model
and configuration customisation
🧊
Apache Iceberg
Content type:
Code
github.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for A handy llama-server launcher with easy model and configuration customisation
Here's a
llama.cpp
CLI Command builder.
🧊
Apache Iceberg
llamabuilding.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for Here's a llama.cpp CLI Command builder.
RATrain: A Resource-Aware Training Runtime for
Large
Language
Models
on Bandwidth-Constrained Heterogeneous Supercomputing Platforms
⚡
Apache Spark
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms
What Is
Generative
AI
?
🧠
AI Engineering
Content type:
Academic
excelsior.edu
·
6d
6 days ago
Actions for What Is Generative AI?
Melanie Mitchell: What We Get Wrong About
AI
🤖
Machine Learning
yalereview.org
·
2d
2 days ago
·
Substack
,
Hacker News
,
Hacker News
Actions for Melanie Mitchell: What We Get Wrong About AI
How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an
LLM
?
🧠
AI Engineering
Content type:
Blog
medium.com
·
3d
3 days ago
Actions for How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?
Building & Benchmarking:
LLMs
on a 16GB Jetson Orin NX for Hermes Agent
🔁
MLOps
Content type:
Blog
dnhkng.github.io
·
2d
2 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Claude vs
GPT-4
: Which
AI
API Is Better for Developers? (2026)
🧠
AI Engineering
kalyna.pro
·
5d
5 days ago
·
DEV
Actions for Claude vs GPT-4: Which AI API Is Better for Developers? (2026)
bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized
model
(Q5_K_M or better) by running 4
generation
streams in one batched decode call. The GPU loads
model
weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft
model
. No quality loss
🔗
gRPC
Content type:
Code
github.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss
Alignment Defends
LLMs
from Property Inference Attacks
🔁
MLOps
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for Alignment Defends LLMs from Property Inference Attacks
Why Shrinking an
AI
Model
Often Makes It More Useful
🧠
AI Engineering
siliconopera.com
·
3d
3 days ago
Actions for Why Shrinking an AI Model Often Makes It More Useful
2x GH200 for
LLM
inference, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
🧠
AI Engineering
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
I built an open-source persistent memory layer for
AI
coding agents
🔗
gRPC
Content type:
Code
github.com
·
1d
1 day ago
·
r/GithubCopilot
Actions for I built an open-source persistent memory layer for AI coding agents
REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of
LLMs
🔁
MLOps
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs
AI
Agents Running Businesses: Andon Labs on Project Vend
🧠
AI Engineering
startuphub.ai
·
6d
6 days ago
Actions for AI Agents Running Businesses: Andon Labs on Project Vend
I
finally
built the central
AI
hub I've been wanting, and Open WebUI made it stupidly simple
🧠
AI Engineering
xda-developers.com
·
3d
3 days ago
Actions for I finally built the central AI hub I've been wanting, and Open WebUI made it stupidly simple
ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for
LLMs
. Intercept every
prompt
and response locally to stop data leaks and runaway token costs.
🧠
AI Engineering
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.
Deep Learning Weekly: Issue 458
🤖
Machine Learning
deeplearningweekly.com
·
6d
6 days ago
Actions for Deep Learning Weekly: Issue 458
LLM-as-a-Discriminator
: When Synthetic Tables Still Look Real
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for LLM-as-a-Discriminator: When Synthetic Tables Still Look Real
Context
Engineering
vs.
Prompt
Engineering
: Why Your AI Agent Gets Dumber the Longer It Runs
🧠
AI Engineering
Content type:
Blog
medium.com
·
5d
5 days ago
Actions for Context Engineering vs. Prompt Engineering: Why Your AI Agent Gets Dumber the Longer It Runs
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help