tamaulipas's Feed

RouteBalance: Fused Model Routing and Load Balancing for Heterogeneous LLM Serving

Heterogeneous LLM serving stacks split scheduling into two layers that optimize in isolation: model routers pick a model from quality and cost signals while ignoring instance load, and serving load balancers optimize queues while ignoring quality. We present RouteBalance, a serving-aware scheduling layer that fuses both into a single online assignment over concrete model instances, jointly trading off quality, latency, and cost. A batched in-pro... Read more ›

📡Observability medium.com

AI Observability — Logging, Tracing, and Debugging ML Systems in Production

Issue #29: Read more ›

⚙️AI Engineering medium.com

Building a Retrieval-Augmented Generation (RAG) Chatbot using LangChain and OpenAI

A beginner-friendly guide to building AI-powered document question answering systems Read more ›

🛠️MLOps medium.com

Building a Production-Style MLOps Platform from Scratch

Continue reading on Medium » Read more ›

🗄️Databases kargarisaac.medium.com·

Post-Training a 0.8B SQL Agent with Off-Policy Soft-Label Distillation

TL;DR Read more ›

✍️Prompt Engineering arXiv·

Beyond Templates: Revisiting Zero-Shot Remote Sensing through Meta-Prompting

Vision-language models (VLMs) have sparked growing interest in zero-shot Earth Observation (EO) downstream tasks, with further gains enabled by remote-sensing-adapted models. We examine this setting across 17 VLM variants and 12 remote sensing (RS) datasets under Meta-Prompting for Visual Recognition (MPVR), and show that zero-shot performance remains highly sensitive to textual design choices, from the meta-prompts used to guide the LLM in ge... Read more ›

🚀High Performance arXiv·

GARIP: A Running-Average Moving Reference for Last-Iterate Self-Play in Two-Player Zero-Sum Games

Self-play with naive gradient ascent cycles in two-player zero-sum games: the last iterate orbits the equilibrium. Modern methods restore last-iterate convergence by regularizing toward a reference policy -- MMD a fixed one (reaching only the regularized equilibrium), R-NaD a periodic snapshot (the engine of DeepNash). We study GARIP, which anchors to the running average, and isolate what the choice of reference controls. Our central result is... Read more ›

📚RAG medium.com

How Vector Search Actually Works: IVF and HNSW

IVF’s cluster-and-probe vs HNSW’s small-world graph — recall, memory, build time, and how each lands in pgvector and FAISS. Read more ›

⚙️AI Engineering medium.com

This Tiny Python Package Runs a Local LLM With One pip install — and It Honestly Feels Like Magic

No Ollama. No API keys. No native installs. Just pip install freeaiagent and suddenly any app you write has a local AI brain it can call… Read more ›

🔌API Design arXiv·

UnBias-Plus: Detect, Explain, and Rewrite Bias

Bias in natural language remains a persistent challenge in both human-written and AI-generated content, affecting domains such as journalism, education, and AI research. Most existing detection methods identify only the presence of bias, with limited support for granular detection, interpretable explanations, neutral rewriting, and openly available trained models. We present UnBias-Plus, an open-source toolkit unifying (1) segment-level multi-... Read more ›

📡Observability Towards AI

Designing AI Platforms That Scale: A Practical Blueprint

🧠LLMs medium.com

# Fictional Framing as a Prompt Injection Vector: A Reproducibility Study on GPT-4o and Claude

Most prompt injection demos rely on something that looks adversarial on the page — “ignore previous instructions,” role-override tricks… Read more ›

📐CS Fundamentals arXiv·

Stationary Robust Mean-Field Games under Model Mismatches

Deploying multi-agent reinforcement learning (MARL) in the real world is often limited by model mismatches between the training simulators and the true environment, which could be further amplified through strategic interactions and result in severe performance degradation upon deployment. Distributional robustness offers a principled response by optimizing policies against worst-case transition models drawn from an uncertainty set, but standard... Read more ›

🛠️MLOps medium.com

MLOps Pillar #1: How to Structure Data Workflows for Scalable Machine Learning

Why strong data workflows, reusable features, and traceable lineage are the foundation of scalable ML systems Read more ›

🗄️Databases medium.com

One Database, Every Layer of Your AI App: Cosmos DB, Azure SQL, and HorizonDB in Practice

A practical look at how Azure Cosmos DB, Azure SQL Database, and the new Azure HorizonDB are bringing AI directly into the database engine. Read more ›

✍️Prompt Engineering arXiv·

Reinforcement learning to improve large language model-based automated code compliance systems

Large language model (LLM)-based approaches for automated code compliance (ACC) of building regulations are prone to generating incorrect and hallucinated computer-processable rules. This paper introduces P4IR, a two-stage framework that uses supervised fine-tuning (SFT) to instill domain knowledge in an LLM, followed by Group Relative Policy Optimization (GRPO) to improve the accuracy of the generated intermediate representations in the form ... Read more ›

📚RAG medium.com

What Is Metadata Filtering in Vector Search?

Most AI teams obsess over embeddings and models. The smartest ones obsess over retrieval precision. Read more ›

⚙️AI Engineering medium.com

How to Run Powerful AI Locally: A Beginner-Friendly Guide to LangChain + Ollama

If you’ve ever played around with building AI apps, you’ve probably noticed a massive catch: the cloud tax. Every time your application… Read more ›

📐CS Fundamentals arXiv·

Multi-cancer detection using a computationally efficient CNN with transfer learning

This study introduces a computationally efficient convolutional neural network (CNN) architecture enhanced with transfer learning for multi-cancer detection using biomedical images. The proposed lightweight CNN model is designed to reduce computational complexity while maintaining high classification performance, making it suitable for deployment in resource-constrained environments. We evaluate this approach on three distinct tumor datasets c... Read more ›

🛠️MLOps mayursurani.medium.com·

MLflow Architecture Deep Dive: Understanding the Four Core Components

The Business Problem Read more ›