Local LLMs

Feeds to Scour
SubscribedAll
Scoured 132 posts in 10.1 ms

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

 🤗Open Source AI  Content type: Code
github.com··Hacker News
Less-relevant results

Arconia for Spring Boot: DevEx, Observability, Multitenancy, GenAI, Cloud Native

 ☁️Cloud Computing  Content type: Code
arconia.io··Hacker News

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

 🤗Open Source AI  Content type: Blog
huggingface.co·

OPRD: On-Policy Representation Distillation

 🧠Transformers  Content type: Academic
arxiv.org··Hacker News

Tired of GitHub Trending being GitHub-only, so we made a multi-forge version (GitLab and Codeberg included)

 👥CrewAI

Riemann-bench | Surge AI

 🤗Open Source AI
surgehq.ai··Hacker News

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

 🟢NVIDIA  Content type: Blog
lucebox.com··Hacker News

mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

 🤗Open Source AI  Content type: Code
github.com··r/LocalLLaMA

Apple rebuilt its on-device AI stack at WWDC 2026

 🟢NVIDIA  Content type: Blog
ziraph.com··Hacker News

Logits as a new monitor for evaluation awareness

 ✍️Prompt Engineering
lesswrong.com··Hacker News

Show HN: One API Key for 45 AI Models – Pay per Token, OpenAI Compatible

 🚀Product Launches

The Anatomy of a Learning Stall

 🤖AI Coding  Content type: Blog

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

 🤗Open Source AI  Content type: Code
github.com··Hacker News

How to Become an AWS AI Architect,The Honest Roadmap, the Projects, and Landing the Job

 ☁️Cloud Computing
hackernoon.com·

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 🟢NVIDIA  Content type: Code
github.com··r/LocalLLaMA

The OnlyFans Economy of American AI

 🎵Vibe Coding  Content type: Blog
leoveanu.com··Hacker News

Hacker News Trends: Search Hacker News super fast with Redis

 ⚙️DevOps

Show HN: Zerostack, an open coding agent optimized for memory footprint

 ✍️Prompt Engineering

lutusp/photo_database_webpage_generator: A photo database searchable webpage generator

 🤗Open Source AI  Content type: Code
github.com··Hacker News

No more posts from kudolink's subscribed feeds.

Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help