🎯 Tensor Cores - miterion · Scour

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

arxiv.org·2d

🌊CUDA Streams

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·1d

🏎️TensorRT

anulum/sc-neurocore: Verified Rust-based Neuromorphic Compiler. 512x Real-Time Speed. Bit-True FPGA Equivalence. (AGPLv3 / Commercial)

github.com·2d·

Discuss: Hacker News

📊Profiling Tools

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·1d·

Discuss: Hacker News

⚡ONNX Runtime

Faster AI Training Unlocked With New System For Massive Language Models

quantumzeitgeist.com·5d

CUDA Shared Memory Bank Conflict-Free Vectorized Access

leimao.github.io·1d

🎛️CUDA Optimization

Porting an INT8 VHDL CNN from Intel Agilex 3 to Lattice Certus-NX

news.ycombinator.com·2d·

Discuss: Hacker News

FHE for SIMD Arithmetic Logic Units with Amortized $O(1)$ Bootstrapping per Ciphertext

eprint.iacr.org·2d

LLMs struggle to verbalize their internal reasoning

lesswrong.com·35m

📊Gradient Accumulation

Distributed Training Across Mixed GPUs: Solving the Heterogeneous Fleet Problem

shardpool.aurora-sentient.net·4h·

Discuss: DEV

Gemini 3 Deep Think: A Complete Guide to Google's Most Advanced Reasoning Mode (2026)

curateclick.com·1d·

Discuss: Hacker News

⚡ONNX Runtime

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory

arxiv.org·1d

⚡Flash Attention

news.smol.ai·2d

🤖AI Coding Tools

IMC: Free-Space Optical Neural Network With High Clockrate (Berkeley, USC, TU Berlin)

semiengineering.com·1d

⚡Flash Attention

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

newsroom.arm.com·1d

⏱️CUDA Events

Ultrafast visual perception beyond human capabilities enabled by motion analysis using synaptic transistors

nature.com·20h·

Discuss: r/compsci

⚡Flash Attention

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·1d·

Discuss: Substack

How Andrej Karpathy Built a Working Transformer in 243 Lines of Code

analyticsvidhya.com·2d

📜TorchScript

OpenAI introduces GPT‑5.3‑Codex‑Spark, an ultra-fast coding model powered by Cerebras

neowin.net·1d

⚡Flash Attention

An AI processor by any other name

jonpeddie.com·20h

Loading more...