🤖 LLMs - jyunzhang

🔍RAG Blog

dev.to··DEV

RAG-Based Testing Series — Part 1: What Is RAG & Why Your Old Testing Playbook Won't Work Here

🔍RAG

linkedin.com··DEV

Measuring Embedding Drift: Why Hybrid Search Saves Stale Models.

🤖AI

pub.towardsai.net

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

🤖AI Academic

arxiv.org·

Claude vs GPT-4: Which AI API Is Better for Developers? (2026)

🎭Anthropic Claude

kalyna.pro··DEV

Initial impressions of Claude Fable 5

🎭Anthropic Claude

simonwillison.net··Hacker News

A handy llama-server launcher with easy model and configuration customisation

📝NLP Code

github.com··r/LocalLLaMA

What Are Tokens in LLMs?

📝NLP Blog

bearisland.dev··Hacker News

LangChain Series #2: Models Explained — LLMs, Chat Models, and Embeddings with Practical…

🤖Transformers

pub.towardsai.net

rag-explained-how-it-works

🔍RAG Blog

dev.to··DEV

TA-RAG: Tone-Aware Retrieval-Augmented Generation for Peer-Support Health Communication

🔍RAG Academic

arxiv.org·

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

🦙Ollama Code

github.com··Hacker News

The complete guide to claude code configuration file

🎭Anthropic Claude Blog

dev.to··DEV

What is Agentic RAG? Building Multi-Agent Agentic RAG Systems

🔍RAG

pub.towardsai.net

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

🔍RAG Code

github.com··DEV

TrustMargin: Training-Free Arbitration between Parametric Memory and Retrieved Evidence in Large Language Models

📝NLP Academic

arxiv.org·

LLM Inference Handbook 2026

🏗️Systems Design

pub.towardsai.net

Open-LLM-VTuber Review: Offline AI Companion with Live2D

🦙Ollama Blog

dev.to··DEV

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖Machine Learning Code

github.com··Hacker News

Using Scikit-LLM with Open-Source LLMs

Classical RAG vs Agentic RAG: a practical decision guide

RAG-Based Testing Series — Part 1: What Is RAG & Why Your Old Testing Playbook Won't Work Here

Measuring Embedding Drift: Why Hybrid Search Saves Stale Models.

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

Claude vs GPT-4: Which AI API Is Better for Developers? (2026)

Initial impressions of Claude Fable 5

A handy llama-server launcher with easy model and configuration customisation

What Are Tokens in LLMs?

LangChain Series #2: Models Explained — LLMs, Chat Models, and Embeddings with Practical…

rag-explained-how-it-works

TA-RAG: Tone-Aware Retrieval-Augmented Generation for Peer-Support Health Communication

The complete guide to claude code configuration file

What is Agentic RAG? Building Multi-Agent Agentic RAG Systems

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

TrustMargin: Training-Free Arbitration between Parametric Memory and Retrieved Evidence in Large Language Models

LLM Inference Handbook 2026

Open-LLM-VTuber Review: Offline AI Companion with Live2D

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.