⚡ Model Efficiency - jimman · Scour

Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It

hackernoon.com·19h

⚡LLM Optimization

Flag this post

My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X

gau-nernst.github.io·23h·

Discuss: Hacker News

⚡LLM Optimization

Flag this post

Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity Scores

arxiv.org·20h

⚡LLM Optimization

Flag this post

Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)

sebastianraschka.com·21h·

Discuss: r/LLM

⚡LLM Optimization

Flag this post

Enhanced Richardson Extrapolation via Adaptive Kernel Regression and Uncertainty Quantification

dev.to·11h·

Discuss: DEV

⚡LLM Optimization

Flag this post

A Thesis and Playbook for Edge AI

ondeviceguy.substack.com·14h·

Discuss: Substack

⚡LLM Optimization

Flag this post

How fast can an LLM go?

fergusfinn.com·4d·

Discuss: Hacker News

⚡LLM Optimization

Flag this post

A hitchhiker's guide to CUDA programming

seanzhang.me·4d·

Discuss: Hacker News

⚡LLM Optimization

Flag this post

Unlocking AI Potential: Squeezing Giant Models into Tiny Spaces

dev.to·1d·

Discuss: DEV

⚡LLM Optimization

Flag this post

Where to Buy or Rent GPUs for LLM Inference: The 2026 GPU Procurement Guide

bentoml.com·3d·

Discuss: Hacker News

⚡LLM Optimization

Flag this post

[R] We were wrong about SNNs. The bo.ttleneck isn't binary/sparsity, it's frequency.

reddit.com·15h·

Discuss: r/MachineLearning

⚡LLM Optimization

Flag this post

Kimi Linear: An Expressive, Efficient Attention Architecture

arxiviq.substack.com·2d·

Discuss: Substack

⚡LLM Optimization

Flag this post

Writing an LLM from scratch, part 26 – evaluating the fine-tuned model

gilesthomas.com·5h·

Discuss: Hacker News

⚡LLM Optimization

Flag this post

From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers

towardsdatascience.com·1d

⚡LLM Optimization

Flag this post

How to access and use Minimax M2 API

dev.to·19h·

Discuss: DEV

⚡LLM Optimization

Flag this post

Scaling Coding-Agent RL to 32x H100s. 160% Improvement on Stanford's TBench

github.com·12h·

Discuss: Hacker News, r/LocalLLaMA

⚡LLM Optimization

Flag this post

Machine Scheduler in LLVM – Part II

myhsu.xyz·1d·

Discuss: Hacker News, Hacker News

⚡LLM Optimization

Flag this post

Essential Things to Know Before Upgrading Your Computer Memory

buysellram.com·8h·

Discuss: Hacker News

Flag this post

Small Vs. Large Language Models

semiengineering.com·16h·

Discuss: Hacker News, r/LLM

⚡LLM Optimization

Flag this post

We found embedding indexing bottleneck in the least expected place: JSON parsing

nixiesearch.substack.com·8h·

Discuss: Substack

⚡LLM Optimization

Flag this post

Loading more...