🏎️ TensorRT - miterion · Scour

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

developer.nvidia.com·4d·

Discuss: Hacker News

⚡ONNX Runtime

MING: An Automated CNN-to-Edge MLIR HLS framework

arxiv.org·14h

The 5 Distributed Training Methods: How to Train Models Too Large for One GPU

pub.towardsai.net

·4h

BetaZero V2: A Diffusion Model for Setting Boulder Problems

evmojo37.substack.com·20h·

Discuss: Substack

📊Gradient Accumulation

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

machinelearning.apple.com·19h

🎓Model Distillation

Running Machine Learning on Arduino Nano

hackster.io·9h

🎯Tensor Cores

antirez/iris.c: Flux 2 image generation model pure C inference

github.com·4h

📉Model Quantization

Scaling LLM Post-Training at Netflix

netflixtechblog.com·11h

📊Gradient Accumulation

Visual Introduction to PyTorch

0byte.io·6h·

Discuss: Hacker News

Latent Generative Solvers for Generalizable Long-Term Physics Simulation

arxiv.org·14h

⚡ONNX Runtime

Building a Production ML Inference Stack with KServe, vLLM, and Karmada

dev.to·16h·

Discuss: DEV

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·1d

🎯Tensor Cores

Presentation: Building Embedding Models for Large-Scale Real-World Applications

infoq.com

·3h

🎓Model Distillation

Leading Inference Providers Cut AI Costs by up to 10x With Open Source Models on NVIDIA Blackwell

blogs.nvidia.com·1d

⚡ONNX Runtime

Deterministic Inference with EigenAI

deterministicinference.com·2d

⚡ONNX Runtime

Show HN: A header-only C++ benchmark for predictive models on raw binary streams

github.com·1d·

Discuss: Hacker News

⚡ONNX Runtime

EyesOff: Why Some Models Quantize Better Than Others

ym2132.github.io·1d·

Discuss: Hacker News

📉Model Quantization

Choosing AI libraries for React is easier once you stop treating them all the same

puckeditor.com·9h·

Discuss: r/reactjs

🤖AI Coding Tools

Ai’s Inner Workings Revealed By Model Trained On One Billion Data Points

quantumzeitgeist.com·1d

📊Gradient Accumulation

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·8h·

Discuss: Substack

Loading more...