🔗 NCCL - miterion · Scour

AI in Multiple GPUs: Understanding the Host and Device Paradigm

towardsdatascience.com·8h

⏱️CUDA Events

BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization

arxiv.org·16h

⚡ONNX Runtime

NVIDIA DGX Spark Powers Big Projects in Higher Education

blogs.nvidia.com·6h

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

arxiv.org·1d

🎯Tensor Cores

harishsg993010/tiny-NPU: opensource NPU for LLM inference (this run gpt2)

github.com·1h·

Discuss: r/LocalLLaMA

⚡ONNX Runtime

Two Ways to Move Tensors Without Stopping: Inside vLLM's Async GPU Transfer Patterns

dev.to·23h·

Discuss: DEV

🌊CUDA Streams

AI, GPU, And HPC Data Centers: The Infrastructure Behind Modern AI

semiengineering.com·13h

⏱️CUDA Events

Recursive Language Models: Stop Stuffing the Context Window

nlp.elvissaravia.com·59m

⚡ONNX Runtime

Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization

machinelearning.apple.com·2d

⏱️CUDA Events

Show HN: 20+ Claude Code agents coordinating on real work (open source)

github.com·4h·

Discuss: Hacker News

🤖AI Coding Tools

WaveSpeedAI Launches "Desktop": The Ultimate Workflow for Power Users Running Daily AI Models

prnewswire.com·4h

🤖AI Coding Tools

Faster AI Training Unlocked With New System For Massive Language Models

quantumzeitgeist.com·3d

🎯Tensor Cores

LAI #114: The Real Work of Production AI

pub.towardsai.net·6h

🤖AI Coding Tools

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

together.ai·21h

⚡ONNX Runtime

OpenAI deploys Cerebras chips for 15x faster code generation in first major move beyond Nvidia

venturebeat.com·3h

Show HN: Solving Sudoku reasoning via Energy Geometric models

davisgeometric.com·11h·

Discuss: Hacker News

CodeSpeak: Software Engineering with AI

codespeak.dev·30m·

Discuss: Lobsters, Hacker News

🤖AI Coding Tools

Coding Agents Meet Distributed Reality

jhellerstein.github.io·3h·

Discuss: Hacker News

🤖AI Coding Tools

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·12h

🎯Tensor Cores

Cisco unveils new AI networking chip, taking on Broadcom and Nvidia

finance.yahoo.com·21h

📊CUDA Graphs

Loading more...