linearizable's Feed

Rescaling MLM-Head for Neural Sparse Retrieval

Learned sparse retrieval (LSR) models such as SPLADE have traditionally used BERT-style masked language models as backbone encoders. A natural expectation is that replacing BERT with stronger pretrained encoders should improve retrieval effectiveness. However, we find that under standard SPLADE training recipes, backbones with large MLM-head L2 norms can suffer performance degradation and even training collapse under standard SPLADE training rec... Read more ›

🔗RDMA LWN.net featured content

Single-hop block replication with RMR and BRMR

How can cloud providers efficiently supply durable virtual block devices? Remote Direct Memory Access (RDMA) provides a way for servers in a cluster to share chunks of memory, but there still needs to be a protocol that operates on top of RDMA to provide the guarantees expected of a block device. The kernel's RDMA transport library (RTRS) provides a way to send messages via RDMA. I : Reliable Multicast over RTRS (RMR) and Block device over RMR (BRMR). These modules, which I am working on with... Read more ›

📇Database Indexes aws.amazon.com·

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Performance enhancements

This is Part 1 of a two-part series covering the key features in PostgreSQL 18. In this post, we focus on performance enhancements: skip scan optimization for multicolumn indexes, enhanced EXPLAIN output, automatic removal of unnecessary self-joins, and several vacuum and autovacuum improvements that help keep your database running efficiently. Read more ›

📦WASM Frank DENIS Blog·

Faster signatures for WebAssembly

I just released ed25519-wasm, a small Rust crate for Ed25519 signatures in WebAssembly. The static library can also be directly linked in applications w... Read more ›

⚡SIMD Vectorization cr.yp.to blog·

EuroQCI feedback

The European Commission has a survey requesting feedback regarding EuroQCI, Europe's sky-high investment in "quantum communication infrastructure". Read more ›

Covers 2 stories including Four Russian satellites are now within striking distance of an ICEYE radarsat

Covered by Techrights

⚙️Database Internals percona.com·

The Failover Brownout: Rethinking High Availability in MySQL Group Replication

It is time to talk again about Flow control and group replication. This time with a special eye on the use of Group Replication in the Kubernetes context. In this article we will dig a bit on how it works and what are the various side effects. The problem Recently I was refining the … The post appeared first on <a href=" Read more ›

🎯Vector Search arxiv.org·

Policy-aware Vector Search: A Vision for Fine Grained Access Control in Vector Databases

Vector databases are increasingly used in security sensitive contexts with Retrieval Augmented Generation and organizational AI pipelines; however, their security capabilities remain limited. Specifically, Fine-grained Access Control (FGAC) which is required to ensure that data access adheres to user-specific policies is not fully supported in modern vector databases. Unlike relational databases, vector databases combine structured and unstructu... Read more ›

🌸Bloom Filters Cryptology ePrint Archive·

Efficient Private Set Intersection and Searchable Encryption using Homomorphic Bloom Filters

Existing encrypted search and private set intersection (PSI) protocols struggle to reconcile post-quantum security with practical efficiency, often leaking search and access patterns or requiring prohibitively deep fully homomorphic encryption (FHE) circuits. We address these limitations by introducing a new Homomorphic Bloom Filters (HBF) framework, a quantum-resilient framework that embeds length-$m$ Bloom filters directly into the plaintext space of an RLWE-based FHE scheme, enabling shall... Read more ›

🧮Constraint Solvers queue.acm.org·

In Code They Think; In Proof We Trust

AI agents that use tools can be hijacked by prompt injection to exfiltrate sensitive data. Runtime defenses such as model alignment, output scanning, and content classifiers are fundamentally reactive: By the time they detect an attack, irreversible actions may already have been taken, and sophisticated encodings such as steganography, encryption, and chunking can evade any content-based check. We propose a preemptive alternative: Constrain the agent to express its plan as a Kotlin script, th... Read more ›

💾Storage Replication percona.com·

Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency

Overview When building high-availability MySQL environments, the choice between MySQL Group Replication (GR) and Percona XtraDB Cluster (PXC) often comes down to how they handle the eternal database dilemma: data consistency versus performance. While both provide “synchronous-like” replication, they approach the problem of stale reads—reading data that has been committed on one node but not … The post appeared first on <a href=" Read more ›

📡Low-Level Networking arxiv.org·

SAC: Disaggregated KV Cache System for Sparse Attention LLMs with CXL

The scaling of LLMs toward long-context inference has shifted the primary serving system bottleneck from computation to memory capacity. Traditional solutions for dense attention models rely on RDMA-based disaggregated memory pools, which perform coarse-grained fetching of the entire prefix KV cache from remote storage to local memory before decoding. However, this approach is fundamentally inefficient for emerging sparse attention models. While... Read more ›

🔄Replication arxiv.org·

The Bi-Channel Networking Paradigm for Database Systems in the Cloud

When network links were slow, cloud and distributed database systems could rely on generic kernel abstractions and treat network communication as a black box. With today's fast cloud networks, this approach breaks down: database performance becomes limited by the CPU overhead of the kernel TCP stack. Replacing TCP with user-space UDP can reduce this overhead, but it requires reimplementing essential guarantees, such as reliability and ordering. ... Read more ›

📦In-process Databases GitHub·

databow

Description: CLI to query any ADBC database.What we like: Run SQL against any ADBC database e.g. DuckDB, BigQuery, Postgres, SQLite. Syntax highlights the query. Output in a table, CSV, JSON, or Apache Arrow. Supports non-interactive queries from the CLI rather than using the TUI.What we dislike: Requires an ADBC driver to exist, which means you can’t query some popular databases like MySQL, Clickhouse, etc. Read more ›

Covers uv

📇Database Indexes villagesql.com·

Fuzzy String Search for MySQL

Add fuzzy string matching to MySQL with VillageSQL. Learn to use trigrams for typos, Levenshtein distance for spell correction, and phonetic matching. Read more ›

📊Columnar Execution arxiv.org·

The Sheaf Laplacian: A Topological Framework for Data Fusion and Consensus in Distributed Sensing Networks

We argue here that traditional network models, which are overwhelmingly based on the mathematical construct of a simple graph, are fundamentally insufficient for capturing the complexity of modern distributed systems. Such systems are characterized by heterogeneous agents with diverse capabilities, high-dimensional and multi-modal data streams, and intricate, context-dependent relationships that cannot be adequately described by a simple connect... Read more ›

🔐MVCC aws.amazon.com·

Deep dive into Amazon Aurora PostgreSQL lock analysis with CloudWatch Database Insights

In this post, we show you how to use Amazon CloudWatch Database Insights for lock analysis in Amazon Aurora PostgreSQL. You learn how to enable the feature, interpret lock tree visualizations, resolve common lock-related issues, and maintain optimal database performance. This lock tree analysis feature also applies to Amazon RDS for PostgreSQL. Read more ›

🔍Information Retrieval arxiv.org·

ADORE: Iterative Query Expansion with Retrieval-Grounded Relevance Feedback

LLM-based query expansion improves retrieval by enriching the original query with additional context. Yet most methods remain generation-driven, producing plausible pseudo-documents or expansions without checking how the target corpus responds. This can introduce retrieval drift, amplify misleading vocabulary, or miss terms that distinguish relevant from non-relevant documents. We argue that effective expansion requires retrieval-grounded feedba... Read more ›

🔀Join Algorithms arxiv.org·

REMOP: REmote-Memory-aware OPerator Optimization

Remote and disaggregated memory tiers expand the effective memory capacity of analytical database engines, but they also reshape the cost structure of out-of-memory query processing. When an operator spills beyond local DRAM, moving pages to remote memory incurs both data-transfer time and a fixed round-trip latency per transfer. Classical operator analyses and buffer-allocation heuristics primarily target disk spilling by minimizing total I/O v... Read more ›

🔀CRDTs arxiv.org·

A Composable CRDT Layer for Byzantine-Resilient Deterministic Reconstruction

Conflict-free Replicated Data Types (CRDTs) ensure Strong Eventual Consistency without coordination, but typically assume benign participants and rely on validation or exclusion to handle Byzantine behavior. We address this problem through deterministic state reconstruction: rather than deciding which updates are admissible, all accepted updates are incorporated, while only a subset contributes to the reconstructed state. We instantiate this app... Read more ›

🤝Distributed Consensus Cryptology ePrint Archive·

Gatling: Rapid-Fire Consensus from Parallel Composition

Consensus protocols form the core of blockchains and other replicated state machines, ensuring that all correct nodes process the same totally ordered log of input transactions\. In fault-free executions, performance is driven by the good-case transaction latency -- the time between a transaction becoming known to all nodes and its confirmation by the consensus protocol -- which depends on both how frequently proposals are made and, once made, how quickly they are confirmed\. While prior work... Read more ›