def4ultx's Feed

Feeds to Scour
SubscribedAll
Scoured 42 posts in 13.0 ms
Vector databases are increasingly used in security sensitive contexts with Retrieval Augmented Generation and organizational AI pipelines; however, their security capabilities remain limited. Specifically, Fine-grained Access Control (FGAC) which is required to ensure that data access adheres to user-specific policies is not fully supported in modern vector databases. Unlike relational databases, vector databases combine structured and unstructu... Read more ›
Feeds
🌲LSM TreesRocksDB·
RocksDB has historically been known for poor performance when tombstones accumulate. This has become a common problem within Meta, and the community has raised it as well. Here, we introduce an optimization that attempts to convert contiguous tombstones into a range tombstone during scans. As a result, instead of skipping through N tombstones, we only need to skip through a single range tombstone. Background: point tombstones and range tombstones RocksDB is an LSM-tree, so a delete does not e... Read more ›
Feeds
🐧LinuxLWN.net·
Security updates have been issued by Debian (ffmpeg), Fedora (erlang, ffmpeg, prometheus, python-scrapy, python3-docs, python3.14, thorvg, tigervnc, and vips), Mageia (mumble and sslh), Oracle (389-ds:1.4, dracut, firefox, hplip, kernel, openssh, postgresql:15, redis:6, and uek-kernel), Red Hat (delve, gvisor-tap-vsock, nginx, nginx:1.24, nginx:1.26, osbuild-composer, podman, rhc, skopeo, and yggdrasil), SUSE (containerized-data-importer, graphite2, kernel, libarchive, openssh, openssh-askpas... Read more ›
Feeds
This is Part 1 of a two-part series covering the key features in PostgreSQL 18. In this post, we focus on performance enhancements: skip scan optimization for multicolumn indexes, enhanced EXPLAIN output, automatic removal of unnecessary self-joins, and several vacuum and autovacuum improvements that help keep your database running efficiently. Read more ›
Feeds
The lines between transactional systems, analytical systems, hybrid systems, and shared storage architectures are getting blurry. This post proposes a small taxonomy for describing the different ways systems, workloads, storage tiers, visibility, and durable copies relate to each other. OLTP, OLAP, HTAP, and now LTAP. We can think of the first two as two types of workload which have specialized query engines and storage systems to support them. OLTP such as the RDBMS like Postgres and MySQL u... Read more ›
Feeds
Much like everyone else who wants to engage as little as possible with "here are some effective threats to make the computer do what you want" while still learning about exciting new computer things, I have been learning how to use Jujutsu. I recently had my epiphany insight where it all clicked and I understood the mental model (which I will not share, because of theory). Something this has me reflecting on is how "reconcile these two divergent histories" is a surprisingly common operation i... Read more ›
Feeds
Hash tables complete the insertion, lookup, and deletion of a single key in constant time on average, and they are widely used in databases, key-value stores, and network systems. In the Internet of Things (IoT), the number of devices and the volume of sensed data keep growing, so the hash tables that store or index these data consume more and more memory. When a single server runs out of memory, the system can place part of the data in the memo... Read more ›
Feeds
Recently, I how common polymorphic associations actually are in relational databases — a performance-hostile pattern built around a discriminated foreign key that ORMs (Rails, Django, Hibernate), CRM platforms (Salesforce), and 1C generate automatically. The front page of a typical online store, or the activity feed of a CRM, is built by exactly this kind of query: a base table is LEFT JOIN-ed to every possible subtype through a (type, id) pair of columns.That earlier article answered the que... Read more ›
Discussed on Substack
Feeds
I wanted to know if WebAssembly runtimes are getting faster. Read more ›
Discussed on Hacker News and Lobsters
Feeds
How can cloud providers efficiently supply durable virtual block devices? Remote Direct Memory Access (RDMA) provides a way for servers in a cluster to share chunks of memory, but there still needs to be a protocol that operates on top of RDMA to provide the guarantees expected of a block device. The kernel's RDMA transport library (RTRS) provides a way to send messages via RDMA. I : Reliable Multicast over RTRS (RMR) and Block device over RMR (BRMR). These modules, which I am working on with... Read more ›
Feeds
In the previous article we looked at how the kernel gives every process its own private view of memory. But memory is only half of what a process needs to actually run. The other half is the CPU itself — and there are only so many CPUs in a machine, while there are usually hundreds or thousands of things that want to run on them. So somebody has to decide, constantly, who gets a CPU and for how long. That somebody is the scheduler. Every few milliseconds, on every core, the kernel asks itself... Read more ›
Feeds
🗄️DatabasesarXiv·
Text-to-SQL enables users to query databases using natural language by generating executable SQL queries. Recent methods have increasingly adopted Large Language Models based reinforcement learning (RL) to leverage execution feedback for training. However, existing RL methods assign uniform query-level rewards to all clauses in a SQL query, treating correct and incorrect clauses equally. This coarse-grained reward design leads to insufficient ... Read more ›
Feeds
Sign up or login to customize your feed and get personalized topic recommendations
💻Computer SciencearXiv·
Conversational product search assistants offer a more expressive, natural, and interactive alternative to traditional keyword-based product search. With limited screen space, showing only a few items increases the need for precise preference elicitation, which can prolong conversations, leading to user frustration and session abandonment. Conversely, rushing to recommend items without a clear understanding of preferences risks poor matches and a... Read more ›
Feeds
Efficient genomic k-mer indexing depends on approximate membership query (AMQ) structures that must deliver high throughput, low false-positive rates (FPR), and modest memory footprints. The Super Bloom filter (SBF) is attractive for this scenario because minimizer-guided sharding and the Findere scheme exploit the redundancy of overlapping k-mers. However, those same features cause high per-k-mer compute cost, severe register pressure, and irre... Read more ›
Feeds
🧩MLIRarXiv·
With the rapid expansion of massive multilingual corpora, Multilingual Information Retrieval (MLIR) has emerged as a critical technology for global information access. MLIR enables users to retrieve semantically relevant documents from multilingual text collections using a single-language query. However, recent multilingual dense retrieval models often exhibit a strong preference for documents in the same language as the query. This leads to sev... Read more ›
Feeds
🐘PostgreSQLAWS·
In Part 1 of this series, we explored the performance enhancements in PostgreSQL 18, including skip scan optimization, enhanced EXPLAIN output, automatic self-join removal, and vacuum/autovacuum improvements. In this second part, we focus on security, monitoring, developer productivity, and logical replication enhancements that improve operational efficiency and the overall developer experience. Read more ›
Feeds
This paper presents a new lock, SemanticLock, based on the conflict graph between operations. We can consider it a generalization of a read-write lock where conflicts exist between write operations and all other operations. We demonstrate the effectiveness of our lock in two applications. In the first, we design a toy data structure: an array supporting point queries and different range queries. In the second, potentially of greater interest, we... Read more ›
Feeds
Cross-Chart Retrieval-Augmented Generation (RAG) is critical for complex multi-modal analytical tasks in scientific, business, and political domains. However, existing benchmarks either focus on tables, which are well-structured and textualized, or generate cross-chart questions by simply extracting key points, which often induces lexical overlap between queries and evidence and yields logically inconsistent reasoning chains. To address this, we... Read more ›
Feeds
📈PerformancearXiv·
Remote and disaggregated memory tiers expand the effective memory capacity of analytical database engines, but they also reshape the cost structure of out-of-memory query processing. When an operator spills beyond local DRAM, moving pages to remote memory incurs both data-transfer time and a fixed round-trip latency per transfer. Classical operator analyses and buffer-allocation heuristics primarily target disk spilling by minimizing total I/O v... Read more ›
Feeds
Approximate agreement tasks on graphs are discrete relaxations of consensus, where each process in a distributed system is given as input a vertex on a graph $G$, and processes have to output vertices that lie on a clique of $G$ contained in the convex hull of the input vertices. Although such tasks have been widely studied in a variety of models, graph classes and notions of convexity, it remains largely open for which classes of graphs these p... Read more ›
Feeds

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help