Hardware Acceleration

Feeds to Scour
SubscribedAll
Scoured 91 posts in 9.0 ms

Recent LLVM hash table improvements

 🏗️LLVM  Content type: Blog
maskray.me··Hacker News, r/cpp

🥇Top AI Papers of the Week

 🎮Reinforcement Learning  Content type: News
nlp.elvissaravia.com·

Symbolica 2.0: programmable symbols, JIT evaluators, and type-erased callbacks in Rust

 🖥️GPU Programming

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

 🐧Open Source  Content type: Academic
arxiv.org·
Less-relevant results

Capabilities using Plain Traits

 🎯AI Agents  Content type: Blog
nadrieril.github.io·

maziyarpanahi/openmed: open-source healthcare ai

 🤖AI  Content type: Code
github.com·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 🖥️GPU Programming  Content type: Academic
arxiv.org·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 🤖AI  Content type: Academic
arxiv.org··Hacker News

Advanced Vector Extensions 512 Acceleration of LSH and LEA-GCM

 🔐Cryptography
eprint.iacr.org·

CFRNet: Cycle-Consistent Fixed-Point Training for Real-Time Blind Face Restoration on Consumer Embedded NPUs

 👁️Computer Vision  Content type: Academic
arxiv.org·

Rayforce

 ⚙️Algorithms  Content type: Code

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

 🖥️GPU Programming  Content type: Academic
arxiv.org·

Modeling, Optimizing and Exploring Multi-Die FPGA Routing Architectures

 💾Computer Architecture  Content type: Academic
arxiv.org·

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

 🖥️GPU Programming  Content type: Academic
arxiv.org·

MailoHLS: Multi-Adapter Structure-Aware Learning for Pareto-Driven HLS Pragma Optimization

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Does anyone know what PCIe mode was used for these benchmarks?

 💬LLMs  Content type: Code
github.com··r/LocalLLaMA

Coset Ensemble Decoder for Quantum Error Correction with Algorithm-Hardware Co-Design

 ⚛️Quantum Computing  Content type: Academic
arxiv.org·

GoodQ02/goodq4all: Local-first multimodal epistemic memory for scene-level video, audio, and text intelligence.

 🔍Information Retrieval  Content type: Code
github.com··Hacker News

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

 🖥️GPU Programming  Content type: Academic
arxiv.org·

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 💬LLMs  Content type: Code
github.com··r/LocalLLaMA

No more posts from jhcha.oyo's subscribed feeds.

Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help