Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolatio... Read more ›
When network links were slow, cloud and distributed database systems could rely on generic kernel abstractions and treat network communication as a black box. With today's fast cloud networks, this approach breaks down: database performance becomes limited by the CPU overhead of the kernel TCP stack. Replacing TCP with user-space UDP can reduce this overhead, but it requires reimplementing essential guarantees, such as reliability and ordering. ... Read more ›
Circulant interconnection networks provide symmetric addressing, compact generator descriptions, and uniform local connectivity. This paper maps a degree--redundancy landscape for a fault-tolerant two-hop primitive in directed circulants: given $n$ nodes and degree budget $m$, how large can the worst-case shared-relay multiplicity $R(n,m)$ be? A node is a shared relay for an ordered terminal pair if it has outgoing links to both terminals; an $f... Read more ›
We introduce c-Lattice Aggregation, a fault-tolerant reconstruction problem for distributed verification under crash and Byzantine failures. In our setting, n asynchronous processes supervise a concurrent execution I: each process holds a local sample, and must collaboratively reconstruct I from partial, potentially overlapping observations. A protocol solves c-Lattice Aggregation if at least c correct processes output the complete execution I, ... Read more ›
Current large language model (LLM) inference systems universally deploy ultra-large-scale models using a combination of Tensor Parallelism (TP) and Pipeline Parallelism (PP). However, existing systems treat the model parallelism topology as a static configuration that cannot be flexibly adjusted at runtime. This rigid design creates a fundamental contradiction with the dynamically changing inference workloads in real-world scenarios. State-of-th... Read more ›
Graph-based retrieval at billion-node scale requires jointly solving three tightly coupled problems -- graph construction, representation learning, and real-time serving -- yet existing work addresses each in isolation. We present RankGraph-2, a framework deployed at Meta that co-designs all three lifecycle stages for similarity-based retrieval (U2U2I and U2I2I), where each stage's requirements shape the others. Serving requires a co-learned clu... Read more ›
In this post, we show you how to use Amazon CloudWatch Database Insights for lock analysis in Amazon Aurora PostgreSQL. You learn how to enable the feature, interpret lock tree visualizations, resolve common lock-related issues, and maintain optimal database performance. This lock tree analysis feature also applies to Amazon RDS for PostgreSQL. Read more ›
Multimodal document retrieval--selecting the most relevant multimodal document from a large corpus to answer a natural language query--plays an essential role in Retrieval-Augmented Generation (RAG) systems. State-of-the-art methods represent each document and query with multiple token-level embeddings and use late interaction to achieve high effectiveness. However, such multi-vector representations incur substantial memory overhead during retri... Read more ›
Vector databases typically manage metadata as flat scalar attributes, which limits their ability to express hierarchical directory semantics commonly used to organize code repositories, enterprise documents, and agent memories. As a result, directory-scoped retrieval and structural updates are often implemented as application-layer workarounds, making recursive scope resolution expensive and directory maintenance difficult to keep consistent. Th... Read more ›
In this post, we show you how to use EXPLAIN plans to diagnose and improve query performance in Amazon Aurora DSQL. We introduce a three-layer filter model as a practical framework for understanding where your predicates are evaluated, and walk through the architecture differences that make Aurora DSQL plans unique, the anatomy of an EXPLAIN output, access method selection, and a step-by-step query improvement workflow. Read more ›
Existing Decentralised Identifier (DID) methods require coordination, an agreed global order of operations, to update a DID document: blockchain-anchored methods incur fees and latency; lightweight peer methods (did:key, did:peer) offer no update mechanism; and Sidetree methods still require blockchain ordering for finality. We present did:crdt, a DID method that targets W3C DID Core and removes the need for coordination entirely: there is no le... Read more ›
Exploratory question answering (EQA) over data lakes requires an LLM agent to discover relevant sources, analyze retrieved data, and adapt its actions based on intermediate results. End-to-end accuracy alone cannot distinguish failures in search, planning, data analysis, or the agent's Action Policy: its decisions about what to do next and when to submit an answer. We present SANA (Search Agent Navigation Ablation framework), a diagnostic ablati... Read more ›
Temporal online analytical processing (OLAP) analyzes past states of data whose values change over time. Such histories are naturally stored as interval histories, in which each row records the period during which a value remained valid. Because temporal analyses typically arrive in infrequent, intermittent bursts, serverless execution that launches functions only at query time offers a cost advantage over always-on clusters. Splitting a computa... Read more ›
Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on... Read more ›
Learned indexes improve query performance by adapting search structures to data and workload distributions. Although many learned indexes have been proposed, their trade-offs remain insufficiently understood for spatial range queries, where performance depends not only on model accuracy but also on data and query skew, layout granularity, selectivity, and storage behavior. In this work, we perform an experimental study of learned indexes for spa... Read more ›
Linear probing is one of the simplest and most space-efficient approaches to hash table design, and is widely used in sequential settings due to its compact memory layout. However, designing a concurrent linear-probing hash table with strong liveness guarantees has proved difficult, and only a handful of such algorithms have been proposed, all of which either restrict concurrency or rely on large per-entry metadata, thereby compromising space ef... Read more ›
Serving Mixture-of-Experts (MoE) large language models (LLMs) is challenging because dynamic request workloads interact with sparse expert routing, creating both data-parallel (DP) engine imbalance and expert-level hotspots. Existing LLM serving systems typically make these decisions in isolation: frontend schedulers route requests using coarse request counters, while backend expert balancers rely mainly on aggregate expert activation counts. Th... Read more ›
Determining whether one concurrent operation completed before another began is a fundamental prerequisite for reasoning about the correctness of concurrent systems. We formalize this challenge as the Causal Observability Problem (COP): assign timestamps to the observable boundary events of a concurrent execution, invocations and responses, that faithfully reflect real-time operation order. A solution is complete if it never misses a genuine prec... Read more ›
Add fuzzy string matching to MySQL with VillageSQL. Learn to use trigrams for typos, Levenshtein distance for spell correction, and phonetic matching. Read more ›
Efficient query optimization is crucial for relational database systems, especially for optimizing join orders in complex queries. This work introduces a hybrid approach that integrates Eliminating Cartesian Products (ECP) with splitting the QUBO search space (SQSS) to reduce the size of the QUBO problem, minimizing binary variables and constraints. This improves the performance of the quantum algorithm while lowering hardware requirements. We e... Read more ›