🔢 cuBLAS - miterion · Scour

CUDA Shared Memory Bank Conflict-Free Vectorized Access

leimao.github.io·18h

🎛️CUDA Optimization

Lambda admissible subspaces of self adjoint matrices

arxiv.org·21h

🔗Kernel Fusion

AI in Multiple GPUs: Point-to-Point and Collective Operations

towardsdatascience.com·13h

Breaking the Tractability Barrier: A Generic Low-Level Solver for NP-Hard Instances (N=63) on Commodity 64-Bit Silicon

zenodo.org·19h·

Discuss: r/programming

latazadehomero/cornell-marginalia: A lightweight Obsidian plugin designed for students who use the Cornell Note-taking System.

github.com·3h

On the Block-Diagonalization and Multiplicative Equivalence of Quaternion $Z$-Block Circulant Matrices with their Applications

arxiv.org·21h

🔀Operator Fusion

A RISC-V vector extension primer

blog.adafruit.com·1d

Turn Any Photo Into a 3D Asset: A Guide to Meshy v6 Image-to-3D

hackernoon.com·1d

Custom Kernels for All from Codex and Claude

huggingface.co·1d·

Discuss: Hacker News

🎯GPU Kernels

ianbarber.blog·1d·

Discuss: Hacker News

🔲Loop Tiling

OpenAI introduces GPT‑5.3‑Codex‑Spark, an ultra-fast coding model powered by Cerebras

neowin.net·21h

⚡Flash Attention

antirez/iris.c: Flux 2 image generation model pure C inference

github.com·10h

🏎️TensorRT

dev.to·10h·

Discuss: DEV

🔄SIMD Programming

Zvec: SQLite-like simplicity in an embedded vector database (By Alibaba)

zvec.org·1d·

Discuss: Hacker News

From Chunks to Connections: The Intuitive Guide to Graph RAG

pub.towardsai.net

·21h

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·15h·

Discuss: Substack

The Fourth Wave of Computing

lucibrowser.com·16h·

Discuss: Hacker News

Atomistic, but non-complete lattices

dominiczypen.wordpress.com·16h

CPU cloth simulation performance comparable to GPU SotA

sig25ddmpd.github.io·2d·

Discuss: Hacker News

Show HN: Solving Sudoku reasoning via Energy Geometric models

davisgeometric.com·1d·

Discuss: Hacker News

Loading more...