⚡ Hardware Acceleration - surajkadapa · Scour

The copy_if Speedup That Wasn't About copy_if, Or AVX-512

hftuniversity.com··Substack

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🖥️GPU Computing Academic

Niobium Opens Developer Partner Program for The Fog, the First IaaS Purpose-Built for Fully Homomorphic Encryption

🔐Cryptography

RISC-V Summit Europe 2026: Industry and Academia Unite in Bologna to Advance Open Hardware

🔲CPU Architecture News

WOS: a Rust ARM64 kernel from scratch with MMU and GICv2 working

🖥️Operating Systems Code

github.com··Hacker News

Unsloth Gemma 4 QAT

🖥️GPU Computing

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🔢Tensor Cores Academic

arxiv.org··Hacker News

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

🖥️GPU Computing Academic

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

🖥️GPU Computing Academic

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

🖥️GPU Computing Academic

Graph Traversal on Tensor Cores: A BFS Framework for Modern GPUs

🔢Tensor Cores Academic

On GPU Implementation for Multi-Precision Integer Division

🖥️GPU Computing Academic

No more posts from surajkadapa's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Log in to enable infinite scrolling