🤖 AI Engineering - daemsc · Scour

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

⚙️Hardware Architecture Blog

tilert.ai··Hacker News·Cited by 2 articles

NVIDIA RTX Pro 6000 Blackwell: 96GB GDDR7 and the End of VRAM Anxiety

🎮GPU Programming Blog

fitservers.com·

Defense Against Prompt Inversion Attacks: An Information-Theoretic Approach for LLM Collaborative Inference

🧠LLM Research Academic

Why are cached input tokens cheaper with AI services?

🎙️Speech AI

Azure OpenAI Architecture: The Decisions That Actually Matter (Part 2)

🌐Distributed Systems

techcommunity.microsoft.com

·

🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)

🔧Backend Dev

golangprojects.com·

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

🔧Backend Dev Blog

adambien.blog·

Valkey: Unlocked Seattle: The Best Systems Let You Sleep At Night

🔧Backend Dev Blog

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🎮GPU Programming

phoronix.com··r/artificial·Cited by 1 article

Issue #390 - The ML Engineer 🤖

🔧Backend Dev News Blog

machinelearning.substack.com··Substack

Agentic AI Architecture: How CockroachDB Supports Memory, Context, and Control

🌐Distributed Systems Blog

cockroachlabs.com·

The Bill Arrives: How to Manage Agentic AI Costs at Scale

🧠LLM Research Blog

cockroachlabs.com·

Ask HN: Is software engineering still a good career choice for new students?

🔧Backend Dev Discussion

news.ycombinator.com··Hacker News

4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave

🎮GPU Programming Blog

sabareesh.com··Hacker News, r/LocalLLaMA

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🔮Multimodal AI

local-llm.utop.workers.dev··Hacker News·Cited by 1 article

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

🎮GPU Programming

uccl-project.github.io··Hacker News

Predicting the World Cup Winner: Live Coding with Hopswor...

⚙️Systems Programming

hopsworks.ai··Hacker News

Intro — Sehastrajit

🧠LLM Research Blog

MiniPIC: Flexible Position-Independent Caching in <100LOC

🗄️Database Internals Academic

vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!

⚙️Hardware Architecture Code

github.com··Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling