🧠 Local llm - akapaka · Scour

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

🧠LLM Inference Code

github.com··Hacker News

Shrivastava-Aditya/boolean-algebra-engine: Deterministic boolean algebra engine — evaluates expressions, detects contradictions, audits logic rules. MCP server, NL layer, REST API, CLI, Streamlit UI.

🔌Model Context Protocol Code

github.com··Hacker News, r/LLM

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🧠LLM Inference Code

github.com··r/LocalLLaMA

Show HN: SteelWorks, a free-first autonomous business OS

🏠Self-Hosting

therealmacsteel.github.io··Hacker News

Less-relevant results

How to Measure Time To First Token (TTFT) in AI Systems

🧠LLM Inference

qainsights.com··Hacker News

chipmates/agoracosmica: A Living Library You Can Talk To. Open-source educational platform with 30 historical figures from philosophy, science, art, mysticism, and activism. Stories, dialogues, AI conversation, multi-figure councils. Nonprofit, BYOK, self-hostable, no behavioral tracking.

🏠Self-Hosting Code

github.com··Hacker News

TanStack AI: Your MCP, your way

🔌Model Context Protocol Blog

tanstack.com··Hacker News

raeudigerRaeffi/riddlerun: An open source agentic end2end testing tool for your webpages

🏠Self-Hosting Code

github.com··Hacker News, r/OpenAI

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🧠LLM Inference Code

github.com··Hacker News

TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration

🔌Model Context Protocol Code

github.com··Hacker News

Kodiqa-Solutions/Kodiqa-agent: 🧠 One agent. Every model. Zero limits. — Open-source AI coding agent that runs anywhere. 7 providers, 69 commands, local or cloud. Your terminal, your rules.

🔌Model Context Protocol Code

github.com··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖Qwen Code

github.com··Hacker News

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102

🧠LLM Inference Discussion Code

github.com··r/LocalLLaMA

RubyLLM 1.16: concurrent tool execution, Rails-style instrumentation, and more

👁️Observability Code

github.com··Hacker News

rennf93/roboco: RoboCo: 20-agent software company with role-gated lifecycle. Self-hosted. AGPL.

🏠Self-Hosting Code

github.com··Hacker News

Show HN: CLI for scoring OpenAPI for LLM legibility

🧠LLM Inference Code

github.com··Hacker News

ninoxAI/nightwatch: Open-source, local-first, read-only AI SRE: clusters alert storms, investigates root cause over your live systems, proposes human-gated fixes.

📊Prometheus Code

github.com··Hacker News

patriceckhart/zot: Yet another coding agent harness, lightweight and written in go.

🏠Self-Hosting Code

github.com··Hacker News

Alradyin/wallie-V2: AI VTuber / streamer framework with real-time vision, personality engine, and lip-synced avatar — built for Twitch, YouTube, and Kick.

📊Prometheus Code

github.com··Hacker News

model: Granite4 Vision by gabe-l-hart · Pull Request #23545 · ggml-org/llama.cpp

🧠LLM Inference Code

github.com··r/LocalLLaMA

No more posts from akapaka's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Sign up or log in to see more results

Log in to enable infinite scrolling