Why vLLM is the best choice for AI inference today
developers.redhat.comยท3d
๐ONNX
Flag this post
Inference Acceleration from the Ground Up
semiwiki.comยท4d
๐ฏTensor Cores
Flag this post
Ubuntu Blog: Why we brought hardware-optimized GenAI inference to Ubuntu
ubuntu.comยท3d
โกONNX Runtime
Flag this post
A generative dual-input model based on architectural computational optimization and multi-attention mechanism for remaining useful life prediction
sciencedirect.comยท8h
๐งฉAttention Kernels
Flag this post
Prediction: AMD Will Be Worth More Than Broadcom by 2030
fool.comยท6h
๐Nsight
Flag this post
NVIDIA and Samsung working even closer together, new semiconductor AI factory has 50,000+ GPUs
tweaktown.comยท21h
๐Nsight
Flag this post
Platform generated AI slop at scale
markjgsmith.comยท1h
๐คAI Coding Tools
Flag this post
TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
๐ฒLoop Tiling
Flag this post
It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software ...
โฑ๏ธCUDA Events
Flag this post
Finetuning Open-source models with Opus, Sonnet 4.5 and Haiku 4.5
๐Model Quantization
Flag this post
Loading...Loading more...