AI

Llama, qwen, OpenAI, Claude, Anthropic, GPUs, Ollama, Local LLMs

Feeds to Scour
SubscribedAll
Scoured 146 posts in 12.0 ms

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

 🇨🇳Chinese AI  Content type: Code
github.com··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

 🏠Self-Hosting

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

 🇨🇳Chinese AI  Content type: Academic
arxiv.org·

Token4Token — pay-per-token inference on Gnosis + Swarm

 🤖LLM

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 🤖LLM

A system programmer’s guide to LLM inference

 💬NLP  Content type: Blog

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

 🇨🇳Chinese AI  Content type: Code
github.com··Hacker News

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

 🪝eBPF

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🦉Qwen

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend

 🇨🇳Chinese AI  Content type: Blog
vercel.com··Hacker News

raeudigerRaeffi/riddlerun: An open source agentic end2end testing tool for your webpages

 🐳Docker  Content type: Code
github.com··Hacker News, r/OpenAI

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

 LLMs  Content type: Academic
arxiv.org·

Anthropic tops AI Arena rankings as it files for IPO

 🎭Claude  Content type: News  Content type: Blog

patriceckhart/zot: Yet another coding agent harness, lightweight and written in go.

 🔌Claude Plugins  Content type: Code
github.com··Hacker News

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

 🇨🇳Chinese AI  Content type: Academic
arxiv.org·

Large companies can add a local LLM filter layer to considerably reducing their AI costs

 💬NLP

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🧠Machine Learning

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

 📱Edge Computing  Content type: Academic
arxiv.org·

Nvidia Nemotron 3 Ultra

 LLMs

Shrivastava-Aditya/boolean-algebra-engine: Deterministic boolean algebra engine — evaluates expressions, detects contradictions, audits logic rules. MCP server, NL layer, REST API, CLI, Streamlit UI.

 🔌Claude Plugins  Content type: Code
github.com··Hacker News, r/LLM

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help