🎯 Tensor Cores - miterion · Scour

Exploiting the Structure in Tensor Decompositions for Matrix Multiplication

arxiv.org·1d

🔀Operator Fusion

Running Machine Learning on Arduino Nano

hackster.io·2h

🏎️TensorRT

Breaking the Tractability Barrier: A Generic Low-Level Solver for NP-Hard Instances (N=63) on Commodity 64-Bit Silicon

zenodo.org·6h·

Discuss: r/programming

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·1d

⚡Flash Attention

anulum/sc-neurocore: Verified Rust-based Neuromorphic Compiler. 512x Real-Time Speed. Bit-True FPGA Equivalence. (AGPLv3 / Commercial)

github.com·1d·

Discuss: Hacker News

📊Profiling Tools

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

arxiv.org·1d

🌊CUDA Streams

batteryphil/Primal-Discrete-LLM-Training: ComponentThe "Secret Sauce"MemoryZero-Shadow Training: Training without FP16 master weights.MathPrime-Grid LUT: Better precision-per-bit than standard INT4.StabilityVote Buffering: Making Gradient Accumulation work for discrete weights.

github.com·12h·

Discuss: Hacker News

📊Gradient Accumulation

OpenAI introduces GPT‑5.3‑Codex‑Spark, an ultra-fast coding model powered by Cerebras

neowin.net·8h

⚡Flash Attention

Faster AI Training Unlocked With New System For Massive Language Models

quantumzeitgeist.com·3d

Porting an INT8 VHDL CNN from Intel Agilex 3 to Lattice Certus-NX

news.ycombinator.com·23h·

Discuss: Hacker News

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·19h

🏎️TensorRT

Gemini 3 Deep Think: A Complete Guide to Google's Most Advanced Reasoning Mode (2026)

curateclick.com·13h·

Discuss: Hacker News

⚡ONNX Runtime

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·1h·

Discuss: Substack

news.smol.ai·1d

🤖AI Coding Tools

Scaling LLM Post-Training at Netflix

netflixtechblog.com·4h

🏎️TensorRT

IMC: Free-Space Optical Neural Network With High Clockrate (Berkeley, USC, TU Berlin)

semiengineering.com·8h

⚡Flash Attention

Zero State Architecture deep dive

news.ycombinator.com·19h·

Discuss: Hacker News

🧠CPU Architecture

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·1h·

Discuss: Hacker News

⚡ONNX Runtime

Two Ways to Move Tensors Without Stopping: Inside vLLM's Async GPU Transfer Patterns

dev.to·1d·

Discuss: DEV

🌊CUDA Streams

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

newsroom.arm.com·16h

⏱️CUDA Events

Loading more...