Model Deployment, Cross-framework, Inference Engine, Optimization

Deploy an LLM inference service on OpenShift AI
developers.redhat.com·13h
🚀MLOps
Flag this post
Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity Scores
arxiv.org·15h
🔗Kernel Fusion
Flag this post
LangChain vs LangGraph: A Beginner’s Guide to Building Smarter AI Workflows
hackernoon.com·4h
🤖AI Coding Tools
Flag this post
Automated Anomaly Detection & Root Cause Analysis in Complex System Simulations via Adaptive Bayesian Networks
dev.to·1d·
Discuss: DEV
🔄ONNX
Flag this post
Fast, Scalable LDA in C++ with Stochastic Variational Inference
github.com·4h·
Discuss: r/cpp
🏎️TensorRT
Flag this post
We found embedding indexing bottleneck in the least expected place: JSON parsing
nixiesearch.substack.com·3h·
Discuss: Substack
🐕Ruff
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.com·16h·
Discuss: r/LLM
👁️Attention Optimization
Flag this post
A Thesis and Playbook for Edge AI
ondeviceguy.substack.com·9h·
Discuss: Substack
🔄ONNX
Flag this post
Beating XLoader at Speed: Generative AI as a Force Multiplier for Reverse Engineering
research.checkpoint.com·6h
🐕Ruff
Flag this post
A Practitioner's Guide to Kolmogorov-Arnold Networks
arxiviq.substack.com·1d·
Discuss: Substack
📉Model Quantization
Flag this post
How to access and use Minimax M2 API
dev.to·14h·
Discuss: DEV
🚀MLOps
Flag this post
What I learned building Python notebooks to run any AI model (LLM, Vision, Audio) — across CPU, GPU, and NPU
reddit.com·1h·
Discuss: r/programming
🏎️TensorRT
Flag this post
Open Sourcing Kubetorch
run.house·4h·
Discuss: Hacker News
🚀MLOps
Flag this post
TinyML is the most impressive piece of software you can run on any ESP32
xda-developers.com·3d
🔄ONNX
Flag this post
Scaling Coding-Agent RL to 32x H100s. 160% Improvement on Stanford's TBench
github.com·7h·
🤖AI Coding Tools
Flag this post
Synthesized Generative Modeling via Graph-Constrained Semantic Embedding
dev.to·1d·
Discuss: DEV
🎓Model Distillation
Flag this post
Can-t stop till you get enough
cant.bearblog.dev·1d·
Discuss: Hacker News
📜TorchScript
Flag this post
How to Use Multimodal AI Models With Docker Model Runner
docker.com·6h
🔄ONNX
Flag this post
I made a tensor runtime & inference framework in C (good for learning how inference works)
github.com·18h·
📜TorchScript
Flag this post