Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Engineering
🤖 AI Engineering
AI infrastructure, model serving, inference, MLOps
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
219
posts in
7.8
ms
Two Leaps to 1000 Tokens/s on a 1T-Parameter
Model
: On
Inference
Systems, Execution Boundaries, and Co-Design
⚙️
Hardware Architecture
Content type:
Blog
tilert.ai
·
4d
4 days ago
·
Hacker News
·
Cited by 2 articles
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety
🎮
GPU Programming
Content type:
Blog
fitservers.com
·
4d
4 days ago
Actions for NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety
Defense Against Prompt Inversion Attacks: An Information-Theoretic Approach for
LLM
Collaborative
Inference
🧠
LLM Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Defense Against Prompt Inversion Attacks: An Information-Theoretic Approach for LLM Collaborative Inference
Why are cached input tokens cheaper with
AI
services
?
🎙️
Speech AI
xeiaso.net
·
1d
1 day ago
Actions for Why are cached input tokens cheaper with AI services?
Azure OpenAI Architecture: The Decisions That Actually Matter (Part 2)
🌐
Distributed Systems
techcommunity.microsoft.com
·
5d
5 days ago
Actions for Azure OpenAI Architecture: The Decisions That Actually Matter (Part 2)
🇳🇱 Go/Golang job: Senior Backend
Engineer
(Go) | Studio
AI
at Creative Fabrica (Amsterdam, Netherlands)
🔧
Backend Dev
golangprojects.com
·
2d
2 days ago
Actions for 🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)
146th airhacks tv: Rust, Java 25,
AI
Agents, BCE, Web Components, zunit, zb
🔧
Backend Dev
Content type:
Blog
adambien.blog
·
3d
3 days ago
Actions for 146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb
Valkey: Unlocked Seattle: The Best Systems Let You Sleep At Night
🔧
Backend Dev
Content type:
Blog
valkey.io
·
2d
2 days ago
Actions for Valkey: Unlocked Seattle: The Best Systems Let You Sleep At Night
AMD's Lemonade SDK For Local
AI
Adds NVIDIA CUDA Support
🎮
GPU Programming
phoronix.com
·
2d
2 days ago
·
r/artificial
·
Cited by 1 article
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
Issue #390 - The
ML
Engineer
🤖
🔧
Backend Dev
Content type:
News
Content type:
Blog
machinelearning.substack.com
·
6d
6 days ago
·
Substack
Actions for Issue #390 - The ML Engineer 🤖
Agentic
AI
Architecture: How CockroachDB Supports Memory, Context, and Control
🌐
Distributed Systems
Content type:
Blog
cockroachlabs.com
·
2d
2 days ago
Actions for Agentic AI Architecture: How CockroachDB Supports Memory, Context, and Control
The Bill Arrives: How to Manage Agentic
AI
Costs at Scale
🧠
LLM Research
Content type:
Blog
cockroachlabs.com
·
3d
3 days ago
Actions for The Bill Arrives: How to Manage Agentic AI Costs at Scale
Ask HN: Is software
engineering
still a good career choice for new students?
🔧
Backend Dev
Content type:
Discussion
news.ycombinator.com
·
3d
3 days ago
·
Hacker News
Actions for Ask HN: Is software engineering still a good career choice for new students?
4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
🎮
GPU Programming
Content type:
Blog
sabareesh.com
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
Actions for 4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
Running Qwen 35B MoE at 450k Context on a Single 32GB
GPU
🔮
Multimodal AI
local-llm.utop.workers.dev
·
6d
6 days ago
·
Hacker News
·
Cited by 1 article
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
CommBench: Can LLMs Write Correct and Efficient
GPU
Communication Code?
🎮
GPU Programming
uccl-project.github.io
·
2d
2 days ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
Predicting the World Cup Winner: Live Coding with Hopswor...
⚙️
Systems Programming
hopsworks.ai
·
2d
2 days ago
·
Hacker News
Actions for Predicting the World Cup Winner: Live Coding with Hopswor...
Intro — Sehastrajit
🧠
LLM Research
Content type:
Blog
medium.com
·
4d
4 days ago
Actions for Intro — Sehastrajit
MiniPIC: Flexible Position-Independent Caching in <100LOC
🗄️
Database Internals
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for MiniPIC: Flexible Position-Independent Caching in <100LOC
vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!
⚙️
Hardware Architecture
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help