🏠 Local LLMs - kudolink · Scour

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

🤗Open Source AI Code

github.com··Hacker News

Less-relevant results

Arconia for Spring Boot: DevEx, Observability, Multitenancy, GenAI, Cloud Native

☁️Cloud Computing Code

arconia.io··Hacker News

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

🤗Open Source AI Blog

huggingface.co·

OPRD: On-Policy Representation Distillation

🧠Transformers Academic

arxiv.org··Hacker News

Tired of GitHub Trending being GitHub-only, so we made a multi-forge version (GitLab and Codeberg included)

gitgem.org··Hacker News, r/opensource

Riemann-bench | Surge AI

🤗Open Source AI

surgehq.ai··Hacker News

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

🟢NVIDIA Blog

lucebox.com··Hacker News

mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

🤗Open Source AI Code

github.com··r/LocalLLaMA

Apple rebuilt its on-device AI stack at WWDC 2026

🟢NVIDIA Blog

ziraph.com··Hacker News

How to Train Your Goblin

✍️Prompt Engineering

goblins.mchen.workers.dev··Hacker News, Hacker News

Logits as a new monitor for evaluation awareness

✍️Prompt Engineering

lesswrong.com··Hacker News

Show HN: One API Key for 45 AI Models – Pay per Token, OpenAI Compatible

🚀Product Launches

modelhub-api.com··Hacker News

The Anatomy of a Learning Stall

🤖AI Coding Blog

tagide.com··Lobsters, Hacker News, Hacker News, Hacker News

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

🤗Open Source AI Code

github.com··Hacker News

How to Become an AWS AI Architect,The Honest Roadmap, the Projects, and Landing the Job

☁️Cloud Computing

hackernoon.com·

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🟢NVIDIA Code

github.com··r/LocalLLaMA

The OnlyFans Economy of American AI

🎵Vibe Coding Blog

leoveanu.com··Hacker News

Hacker News Trends: Search Hacker News super fast with Redis

hackernewstrends.com··Hacker News

Show HN: Zerostack, an open coding agent optimized for memory footprint

✍️Prompt Engineering

gi-dellav.github.io··Hacker News

lutusp/photo_database_webpage_generator: A photo database searchable webpage generator

🤗Open Source AI Code

github.com··Hacker News

No more posts from kudolink's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Sign up or log in to see more results

Log in to enable infinite scrolling