⚙ LLMOps - olusola.akinsulere

Less-relevant results

🚀MLOps Cocoanetics·

Responses Bug in LM Studio

🚀MLOps medium.com

Don’t Use Ollama for Local LLMs

🚀MLOps pyimagesearch.com·

RAG Observability with Langfuse, vLLM, and FAISS

🚀MLOps medium.com

vLLM, Function Calling, and World Models explained

🦀Rust arxiv.org·

Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing

🚀MLOps mstar.stanford.edu·

M* (M-Star): A Modular, Extensible, Serving System for Multimodal Models

Discussed on Hacker News

🤖AI hackster.io·

Offline AI Voice Assistant on Raspberry Pi 4 with Gemma

🐍Python Anyscale blog posts·

High Performance Distributed Inference with Ray Serve LLM

Covered by Google Cloud Blog

Discussed on Hacker News

🤖AI lemmy.world·

Wrote up a full guide for running AI locally on Windows (LM Studio + Ollama + Open WebUI)

🚀MLOps nazarboyko.com·

Running Local LLMs With Ollama For Private Development

Discussed on DEV

🚀MLOps abhishek.it·

Running GLM-5.2 5x faster at 500tps with limitation

Discussed on Hacker News

🤖AI Red Hat Developer Blog·

llama.cpp vs. vLLM: Choosing the right local LLM inference engine

Covers 7 stories including GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen...

🐍Python pypi.org·

Show HN: Subagent-fleet – AI coding subagents across local Ollama machines

Discussed on Hacker News

🦀Rust langchain.com·

A self-improving agent loop (Sponsor)

Covered by tldr.tech, Steve Sun

🚀MLOps teachmecoolstuff.com·

Fine Tuning a Tiny Local LLM to Categorize Questions

Discussed on Hacker News

🚀MLOps vimal-dwarampudi.medium.com·

LLMOps: Operationalizing Large Language Models in Production

🐍Python youtube.comVideo·

How to Build a High-Performance RAG Pipeline with Ollama, Python and TypeScript

Discussed on DEV

🤖AI GitHub·

Show HN: Alloy – a PyTorch backend and inference engine for Apple Silicon

Discussed on Hacker News

fix(ollama): preserve configured API during discovery (#93729)

The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server

Responses Bug in LM Studio

Don’t Use Ollama for Local LLMs

RAG Observability with Langfuse, vLLM, and FAISS

vLLM, Function Calling, and World Models explained

Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing

M* (M-Star): A Modular, Extensible, Serving System for Multimodal Models

Offline AI Voice Assistant on Raspberry Pi 4 with Gemma

High Performance Distributed Inference with Ray Serve LLM

Wrote up a full guide for running AI locally on Windows (LM Studio + Ollama + Open WebUI)

Running Local LLMs With Ollama For Private Development

Running GLM-5.2 5x faster at 500tps with limitation

llama.cpp vs. vLLM: Choosing the right local LLM inference engine

Show HN: Subagent-fleet – AI coding subagents across local Ollama machines

A self-improving agent loop (Sponsor)

Fine Tuning a Tiny Local LLM to Categorize Questions

LLMOps: Operationalizing Large Language Models in Production

How to Build a High-Performance RAG Pipeline with Ollama, Python and TypeScript

Show HN: Alloy – a PyTorch backend and inference engine for Apple Silicon