🧠 LLMs - JandirWong · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🍎Apple Code

github.com··Hacker News

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

🖥️Retro Computing

everylocalai.com··DEV

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

🤨AI Criticism Academic

What Ollama Reveals About Local AI, Agents, and Open Models

🤨AI Criticism Blog

odsc.medium.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🍎Apple Blog

adambien.blog·

Using Scikit-LLM with Open-Source LLMs

machinelearningmastery.com·

How Large Language Models Are Creating New Security Challenges

🤨AI Criticism Blog

·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

⚙️Systems Programming

zozo123.github.io··Hacker News

Why LLMs (still) lack taste

🤨AI Criticism

beyondtheprior.com··Hacker News

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

⚙️Systems Programming

uccl-project.github.io··Hacker News

Running LLM Inference on Kubernetes: What It Actually Takes

🤨AI Criticism Blog

fairwinds.com·

Fixing a stuck Ollama runner and building a GPU watchdog

patrickmccanna.net··Hacker News

Fine-tuning Large Language Models (LLMs) using PEFT

🤨AI Criticism Blog

·

LLM Routing: From Strategy Selection to Production Architecture

🕸️Networking Blog

RAG Pipeline Explained: From Query to Answer, Step by Step

🖥️Retro Computing Blog

·

How we fight GPU scarcity without compromise

🔐Cybersecurity Blog

equixly.com··Hacker News

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🍎Apple News Tutorial

LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents

🤨AI Criticism Blog

towardsai.net·

How to Build a Deterministic RAG Testing Tool — and Use LLM as an Advisor, Not a Judge

🤨AI Criticism Blog

·

LLMs Are Brilliant. But They Can Be Fooled.

🤨AI Criticism Blog

·

Log in to enable infinite scrolling