🧠 LLMs - JandirWong · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🍎Apple Code

github.com··Hacker News

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🖥️Retro Computing

everylocalai.com··DEV

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

🤨AI Criticism Academic

What Ollama Reveals About Local AI, Agents, and Open Models

🤨AI Criticism Blog

odsc.medium.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🍎Apple Blog

adambien.blog·

Using Scikit-LLM with Open-Source LLMs

machinelearningmastery.com·

Fine-tuning Large Language Models (LLMs) using PEFT

🤨AI Criticism Blog

·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

⚙️Systems Programming

zozo123.github.io··Hacker News

Why LLMs (still) lack taste

🤨AI Criticism

beyondtheprior.com··Hacker News

Running LLM Inference on Kubernetes: What It Actually Takes

🤨AI Criticism Blog

fairwinds.com·

LLM Routing: From Strategy Selection to Production Architecture

🕸️Networking Blog

Fixing a stuck Ollama runner and building a GPU watchdog

patrickmccanna.net··Hacker News

How to Build a Deterministic RAG Testing Tool — and Use LLM as an Advisor, Not a Judge

🤨AI Criticism Blog

·

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🍎Apple News Tutorial

RAG Pipeline Explained: From Query to Answer, Step by Step

🖥️Retro Computing Blog

·

How we fight GPU scarcity without compromise

🔐Cybersecurity Blog

equixly.com··Hacker News

LLMs Are Brilliant. But They Can Be Fooled.

🤨AI Criticism Blog

·

LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents

🤨AI Criticism Blog

towardsai.net·

Improved performance and model support with GGUF

🍎Apple Blog

Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent

⚙️Systems Programming News

spectrum.ieee.org

··Hacker News

Log in to enable infinite scrolling