🤖 LLMs - tianfg · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News, r/LLM

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

🤖AI Blog

cswithsanjay.blogspot.com·

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

🤖AI Academic

From Chatbot Hallucinations to Deterministic Agents: Forcing Local LLMs to Run Production-Grade…

🤖AI Blog

·

Intelligent inference scheduling with llm-d on Red Hat AI

developers.redhat.com·

A reporting checklist for large language models in behavioural science

🤖AI Academic

WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed

🤖AI Discussion Reference

Introducing LLM as a Judge: Scaling search relevance evaluation with AI

🤖AI Blog

opensearch.org·

DiffusionGemma: 4x Faster Text Generation

🤖AI News Blog 19

blog.google··Hacker News, r/LocalLLaMA, r/singularity·Cited by 21 articles

12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI

🤖AI Blog

·

Comprehensive evaluation of LLM capabilities for interpretation and analysis of genome-scale metabolic models in metabolic engineering

🤖AI Academic

How LLMs are Actually Trained

🤖AI News Blog

blog.algomaster.io·

Why Transformer Models Get Costlier as Context Grows

siliconopera.com·

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🤖AI Blog

cloud.google.com··Hacker News·Cited by 1 article

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🎮Game Engines

everylocalai.com··DEV

How ChatGPT Actually Works (Beginner Friendly)

🤖AI Blog

·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

📊Formal Methods

zozo123.github.io··Hacker News

What's in the Box? A Field Guide to AI Models

🤖AI Blog

iankduncan.com·

Run ChatGPT, Claude, Gemini and Perplexity Side-by-Side

aiverdict.github.io··Hacker News

6. Air-Gapped Claude Code - The Claude Code SRE Handbook

har-ki.github.io··Hacker News

Log in to enable infinite scrolling