🦉 Qwen - hop1.ng.1357 · Scour

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

🤖AI Code

github.com··Hacker News

A system programmer’s guide to LLM inference

🤖AI Blog

blog.xiangpeng.systems··Hacker News

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

🤖LLM Academic

Integrate on-device AI models into your app using Core AI - WWDC26 - Videos

developer.apple.com··Hacker News

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

local-llm.utop.workers.dev··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖AI Code

github.com··Hacker News

Humans and LLMs share a mental disorder: Fugue Lock

vwwwv.org··Hacker News

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

🕵️AI Agents Academic

Show HN: Audit any AI/data pairing with Veritrooper

veritrooper.com··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🤖AI News

digg.com··Hacker News

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

🤖AI Academic

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

🤖AI Code

github.com··Hacker News

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

🖼️Multimodal AI Academic

MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models

🤖AI Academic

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

🤖AI Academic

BUDDY: BUdget-Driven DYnamic Depth Routing for Adaptive Large Language Model Inference

🤖AI Academic

From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model

🤖AI Academic

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

⚡Edge AI Academic

FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location

⚡Edge AI Academic

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

🤖AI Academic

Log in to enable infinite scrolling