📊 LLM Evals - alanxu.80 · Scour

Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese

🧠LLMs Academic

Less-relevant results

Claude Fable 5 is Here — Anthropic's Most Powerful Public Model Yet

🏗️Agent Design Patterns Blog

Flaws in the LLM Automation Narrative

🧠LLMs Academic

Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models

🧠LLMs Academic

One AI Vendor Is a Single Point of Failure. Treat It Like One.

💾Agent Memory Blog

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

🧠LLMs Academic

Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity

⚙️MLOps Academic

Detect AI Agent Hallucinations: Zero-Shot Methods

🤖AI Agents Blog

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

🧠LLMs Academic

rag-explained-how-it-works

🔍RAG Blog

Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems

🎼Agent Orchestration Academic

The Search Engine Renaissance: How Apache Lucene and Elasticsearch Are Reclaiming the AI-Native Future

🔍RAG Blog

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

🧠LLMs Academic

Gemma 4 makes on-device multimodal AI good enough to ship

🔐AI Security Blog

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

⚙️MLOps Academic

Cline + LM Studio 2026: complete setup guide, the 32k context trap, and which coding models actually hold up

🌐Open Source AI Blog

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

✍️Prompt Engineering Academic

Prompt Engineering Is Systems Design, Not a User Skill

🧠LLMs Blog

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

⚙️MLOps Academic

I Built an Adversarial Eval Framework and Attacked 5 LLMs — Every Single One Failed

🌐Open Source AI Blog

Log in to enable infinite scrolling