⚡ ONNX Runtime - miterion · Scour

tracefinity/tracefinity: Generate custom gridfinity bins with AI, from photos of your tools 🔍Nsight

github.com·17h

The Inference Bottleneck: Architecting Kubernetes Autoscaling for Production LLMs 🚀MLOps

cloudnativenow.com·6d

Less-relevant results

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips ⏱️CUDA Events

supercomputing-system-ai-lab.github.io·2d·Hacker News

https://www.together.ai/blog/coding-agent-benchmarks ⚡Flash Attention

together.ai·6d

Eliminate LLM Cold starts: Load models up to 6x Faster with Azure Blob Storage and Run:AI Model Streamer ⏱️CUDA Events

devblogs.microsoft.com·2d

What's in a GGUF, besides the weights - and what's still missing? 🔄ONNX

nobodywho.ooo·6d·Hacker News, r/LocalLLaMA

ezzy1630/Argyph: Local-first MCP server giving AI coding agents fast, structured, and semantic context over any codebase. Zero config, zero cloud, full context. 💡LSP

github.com·2d·r/artificial, r/mcp

AWS nabs white hot gen AI media creation startup fal, becoming its preferred cloud provider 🤖AI Coding Tools

venturebeat.com·1d

What can a local model do for you in early May 2026? 🔄ONNX

manichord.com·2d·Hacker News

Distributed Stochastic Graph Algorithms 🌐Distributed Computing

Embedding Tiny Language Models in Flink SQL 🔍Type Checkers

dalelane.co.uk·1d

Show HN: GPT-2 inference in pure C#, 0 bytes allocated per token 🔄ONNX

github.com·3d·Hacker News

I ran this bulky LLM on an SBC cluster, and it's the most unhinged setup I've ever built 🚀MLOps

xda-developers.com·6d

Software 3.0 💡LSP

dsebastien.net·3d

Unleashing the Power of ONNX for Speedier SBERT Inference 🔄ONNX

towardsai.net·2d

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference 🔄ONNX

together.ai·6d

The Professor Who Wanted a Robot 🎓Model Distillation

cs.utexas.edu·6d

codexstar69/pi-listen: Hold-to-talk voice input for Pi CLI — Deepgram streaming STT with live transcription, voice commands, and cross-platform hold detection 🎯Tensor Cores

github.com·2d·Hacker News

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend 🔄ONNX

huggingface.co·3d

Qwen’s MTP test puts local AI back in startup math 🔄ONNX

startupfortune.com·6d

Log in to enable infinite scrolling