🔄 SIMD Programming - miterion · Scour

Lallapallooza/citor: Header-only C++20 thread pool with sub-microsecond dispatch, decentralized work-stealing, and per-CCD arenas. ⏱️CUDA Events

github.com·22h·Hacker News

Less-relevant results

How We Improved Agentic Search 🤖AI Coding Tools

entire.io·6d·Hacker News

V1.6: Ch4 maps, Animations and Save File Overhaul ⚡torch.compile

ezioeagle.itch.io·4d

michelangeloromerochisco/ternative: Inference engine for ternary-weight LLMs with runtime LoRA - the llama.cpp of BitNet models 🔄ONNX

github.com·1d·Hacker News

Compiling the Trap 🔍Type Checkers

Neoclassical C++: segmented iterators revisited (1) ⚡CUDA Programming Patterns

boostedcpp.net·3d·Hacker News

llama : MTP clean-up by ggerganov · Pull Request #23269 ✂️CUTLASS

github.com·2d·r/LocalLLaMA

Whisper.cpp vs Faster-Whisper: Why Speed Tests Lie 📊Profiling Tools

tildalice.io·4d

128-Bit Computing 🎯Tensor Cores

en.wikipedia.org·5d·Hacker News

Userfrom1995/benchd: BenchD is a browser-based CPU benchmark that runs fully on the client. ⏱️Benchmarking

github.com·39m·Hacker News

NoNaeAbC/std_simd: I played around with std::simd ✂️CUTLASS

github.com·6d·Hacker News, Hacker News

Show HN: GPT-2 inference in pure C#, 0 bytes allocated per token ⚡ONNX Runtime

github.com·3d·Hacker News

DioxusLabs/betlang: A tiny (50kb) programming language detection model - Like guesslang, but smaller 🔍Type Checkers

github.com·1d·Hacker News

pyxll/excel-gpt: Minimal GPT model implemented in Excel 📜TorchScript

github.com·7h·Hacker News

Show HN: IResearch – C++ search that beat Lucene and Tantivy on their benchmark 🔍Type Checkers

github.com·1d·Hacker News

I'm Building a Multi-Target Compiler Backend from Scratch 🏗️Build Optimization

·3d·DEV

[project]holt: an experimental Rust metadata index built around persistent ART blobs, WAL, and checkpointing 📊Profiling Tools

github.com·2d·r/rust

xxxn3m3s1sxxx/ATLAS-TQ1_0: TQ1.0 ternary inference engine for BitNet b1.58 on CPU. Pack + run Falcon3-1B/3B/7B/10B, no GPU needed. ✂️CUTLASS

github.com·3d·Hacker News

RedToasty/llama.cpp_qts: Fixing --split-mode tensor, with different KV cache quantization types. 🏎️TensorRT

github.com·4d·r/LocalLLaMA

I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash 🔄ONNX

github.com·5d·r/LocalLLaMA

Sign up or log in to see more results

Log in to enable infinite scrolling