💬 LLMs - kingbeemo · Scour

What Are Tokens in LLMs?

🤖AI Blog

bearisland.dev··Hacker News

Self-hosted remote access for Ollama without complicated setup

oab.arc-i.co.uk··r/selfhosted

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖AI Code

github.com··Hacker News

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

huggingface.co··Hacker News

Location: Göttingen, Germany Remote: Yes (preferred; hybrid also fine) Willing t...

💻Software Dev Discussion

news.ycombinator.com··Hacker News

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

🔐Cybersecurity

posts.inthecyber.com·

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

🤖AI News Blog

developer.nvidia.com·

LLM Inference Engineering Room — Part 3: The Orchestration Layer

🤖AI Blog

vimal-dwarampudi.medium.com·

Melanie Mitchell: What We Get Wrong About AI

yalereview.org··Substack, Hacker News, Hacker News

Making Local LLM Go Brrr

seanpedersen.github.io·

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

🤖AI Academic

Running Ollama on a 15W CPU sounded ridiculous until I got it working with decent results

xda-developers.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🤖AI Blog

adambien.blog·

What's in the Box? A Field Guide to AI Models

🤖AI Blog

iankduncan.com·

Running LLM Inference on Kubernetes: What It Actually Takes

🤖AI Blog

fairwinds.com·

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

The Rise of Agentic AI: What Every Engineer Should Learn

🤖AI Blog

How attackers are gaining access to LLM inference

🤖AI Blog

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🤖AI Blog

cloud.google.com··Hacker News

Log in to enable infinite scrolling