gelfayoumi's Feed

Robust and Automated Reconfiguration of Byzantine Wide-Area Replication

Distributed systems handle adversarial nodes through redundancy, which imposes a significant performance overhead. In blockchain systems, Byzantine fault-tolerant state-machine replication (BFT-SMR) is the replicated service that totally orders client transactions before execution. While prior research has primarily focused on designing novel consensus algorithms with improved performance, recent studies have shown that further gains can be achi... Read more ›

🏗️System Design arxiv.org·

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

Generative recommendation is an emerging paradigm that has shown promise in industrial recommendation systems, aiming to predict users' next interactions from their historical behaviors. At the core of generative recommendation lies item tokenization, which bridges item semantics and recommendation models. However, existing methods often struggle to effectively organize and inject complex user-behavioral and item-semantic contexts into recommend... Read more ›

📄Research Papers arxiv.org·

Multi-Orientation Edge-Minimum Repair for Non-Redundant Fault-Tolerant Broadcasting in Dense Eisenstein--Jacobi Networks

Dense Eisenstein--Jacobi (EJ) networks are degree-six algebraic interconnection networks whose finite quotient geometry is naturally represented by a hexagonal axial-coordinate ball. This paper studies non-redundant one-to-all broadcast repair in the dense EJ network generated by $\alpha=(t+1)+t\omega$, where $t$ is the network diameter. We propose EJ-MOEM, a multi-orientation edge-minimum repair method that evaluates a constant-size family of h... Read more ›

☁️Cloud Infrastructure arxiv.org·

CoAgent: Concurrency Control for Multi-Agent Systems

Multi-agent LLM systems -- coding agents, devops agents, document agents -- now routinely run several agents in parallel against the same git tree, Kubernetes cluster, or document. As soon as two of them mutate shared state, they enter the regime classical concurrency control has studied for decades, but classical mechanisms fit LLM agents poorly. A single agent transaction spans minutes of inference, read sets are broad and opaque rather than s... Read more ›

🔄Concurrency arxiv.org·

Empowering Student Debugging in Parallel Programming with Execution Traces and Large Language Models

Concurrent programming is a core component of Computer Science curricula, yet remains notoriously difficult for students to master due to its inherent complexity and the nondeterministic nature of concurrency bugs such as deadlocks and race conditions. In this work, we present ParaView, an educational tool designed to help students understand, debug, and correct concurrency issues in parallel programs written in C/C++. ParaView provides transpar... Read more ›

🔄Eventual Consistency arxiv.org·

Towards Distributed Inference of LLMs on a P2P Network

Prefix caching can reduce LLM inference latency by reusing KV caches across requests with shared prompts, but cluster-scale reuse is challenging because caches are partitioned across nodes. We propose a decentralized, prefix-cache-aware routing scheme for peer-to-peer LLM serving. Each node maintains a local radix tree of its own cached prefixes and asynchronously refreshed estimates of peer caches using periodic anti-entropy. Requests are route... Read more ›

🧠Query Planners arxiv.org·

Filtered ANN as a Phase Transition: When Selectivity-Estimation Error Causes Plan Regret

A filtered approximate-nearest-neighbor (ANN) query returns the k nearest vectors among those satisfying an attribute predicate P of selectivity s. The best execution strategy -- pre-filter, post-filter, or in-filter -- changes with s, so a system must estimate s and choose. We model this as an argmax over a landscape with phases (regions where each strategy wins) separated by boundaries, and show that selectivity-estimation error produces plan ... Read more ›

⚙️Backend Development villagesql.com·

Fuzzy String Search for MySQL

Add fuzzy string matching to MySQL with VillageSQL. Learn to use trigrams for typos, Levenshtein distance for spell correction, and phonetic matching. Read more ›

🏗️System Design arxiv.org·

Simulation-Based Performance Evaluation of Sharded Blockchain Architectures

Public blockchains continue to struggle with scalability because improving throughput is not as simple as increasing block size or reducing block interval. Larger blocks increase validation and transmission cost, while shorter intervals raise the likelihood of propagation delays, forks, and stale blocks. These limits motivate sharding, where transaction processing is divided across multiple parallel shard groups. In this work, we present a confi... Read more ›

📄Research Papers arxiv.org·

REMOP: REmote-Memory-aware OPerator Optimization

Remote and disaggregated memory tiers expand the effective memory capacity of analytical database engines, but they also reshape the cost structure of out-of-memory query processing. When an operator spills beyond local DRAM, moving pages to remote memory incurs both data-transfer time and a fixed round-trip latency per transfer. Classical operator analyses and buffer-allocation heuristics primarily target disk spilling by minimizing total I/O v... Read more ›

🔄Concurrency arxiv.org·

Pulse: Training Acceleration for Large Diffusion Models with Automatic Pipeline Parallelism

Diffusion models are now a dominant approach for high-fidelity image and video generation, yet scaling their training across GPU clusters remains challenging. Unlike transformer-only architectures, diffusion backbones commonly adopt UNet-style encoder-decoder structures with heterogeneous layers and long-range skip connections. Under conventional pipeline parallelism, these non-local dependencies force large skip activations and their gradients ... Read more ›

☁️Cloud Infrastructure arxiv.org·

ShuntServe: Cost-Efficient LLM Serving on Heterogeneous Spot GPU Clusters

As large language model (LLM) services become widely adopted, the cost of GPU resources for serving these models in cloud environments has emerged as a critical concern. Spot instances offer up to 90% cost savings over on-demand instances, but their frequent interruptions and limited availability pose significant challenges for continuous LLM serving. GPU spot instances, in particular, exhibit lower and more volatile availability than CPU-based ... Read more ›

⚙️Backend Development aws.amazon.com·

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Security, monitoring, and developer enhancements

In Part 1 of this series, we explored the performance enhancements in PostgreSQL 18, including skip scan optimization, enhanced EXPLAIN output, automatic self-join removal, and vacuum/autovacuum improvements. In this second part, we focus on security, monitoring, developer productivity, and logical replication enhancements that improve operational efficiency and the overall developer experience. Read more ›

🏗️System Design arxiv.org·

AoiZora: Topology-Aware Auto-Parallel Optimization for Inference of Diffusion Transformers

Video diffusion has quickly grown into a key generative serving workload, yet producing each clip demands many denoising iterations over large spatio-temporal latents, which puts low-latency inference out of reach on a single device. A denoising step is therefore typically distributed across multiple accelerators, and TPU sub-slices have become an attractive and practical fabric for doing so. Current auto-parallel systems, however, search almost... Read more ›

🔄Concurrency arxiv.org·

Tangram: Hiding GPU Heterogeneity for Efficient LLM Parallelization

The scale of LLM training jobs requires parallelization planning over large GPU clusters. Due to different GPU types and interconnects added over time, these GPU clusters are increasingly heterogeneous. Automatic LLM parallelizers can search for parallelization plans but face an exploding search space with heterogeneous GPUs. To make search tractable in heterogeneous GPU clusters, parallelizers often omit types of parallelism (e.g., expert paral... Read more ›

☁️Cloud Infrastructure arxiv.org·

Incentives and Evidence in Learned Service Orchestration

Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, ... Read more ›

📄Research Papers arxiv.org·

Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science

Research methods are essential carriers of knowledge contribution in academic papers. Automatic multi-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full-text content faces challenges related to ex... Read more ›

⚙️Backend Development LWN.net featured content·

Security updates for Wednesday

Security updates have been issued by AlmaLinux (hplip, kernel, kernel-rt, libpng12, libpng15, libxml2, libxslt, mysql:8.0, mysql:8.4, opencryptoki, openssl, postfix, postgresql:15, rsync, and webkit2gtk3), Debian (asterisk, atril, gsasl, and libreoffice), Fedora (ack, bird, chromium, firefox, ldns, librabbitmq, nextcloud, nss, openslide, perl-Protocol-HTTP2, tig, vorbis-tools, and xen), Mageia (coturn, log4cxx, and python-tornado), SUSE (389-ds, buildah, container-suseconnect, distribution, e... Read more ›

🏗️System Design arxiv.org·

Retrieval-as-a-Service:A System-Oriented Analysis of Industrial Retrieval Pipelines in Web Systems

Retrieval systems have become a foundational infrastructure component in modern Web services, supporting applications such as content recommendation, advertising targeting, and API discovery. In large-scale industrial environments, retrieval is increasingly deployed as an independent service layer, commonly referred to as Retrieval-as-a-Service (RaaS). This paper presents a system-oriented survey of industrial retrieval pipelines, focusing on ar... Read more ›

🔄Concurrency arxiv.org·

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

Training a model to predict the next step in a concurrent program is harder than it looks: two runs of the same program from the same trace prefix can produce different next events, both valid, because the scheduler is nondeterministic. A model trained against a single label is learning to guess one outcome of a random process. We turn this around and use the nondeterminism as a training signal. We run each program many times, aggregate the obse... Read more ›