Table of Contents RAG Observability with Langfuse, vLLM, and FAISS Introduction to Production-Grade RAG and LLM Observability RAG Observability Architecture with Langfuse, vLLM, and FAISS Project Setup Building a Langfuse-Traced Retriever with FAISS Building a Traced LLM Wrapper for vLLM… The post appeared first on <a rel="nofollow" href=" Read more ›
Rust has made safe systems programming practical on the CPU, but writing custom GPU kernels in Rust still forces programmers outside the language's ownership guarantees. We present cuTile Rust, a tile-based system for safe, idiomatic GPU kernel authoring in Rust. cuTile Rust extends Rust's ownership discipline to tile-based GPU kernels: mutable outputs are split into disjoint pieces, kernel launches preserve the host-side ownership contract, and... Read more ›
The Simple Version When a system has too many moving parts that need to stay in sync, adding more parts often makes failures more likely, not less. Sometimes the most reliable architecture is a smaller one. The Counterintuitive Math of Reliability Reliability in distributed systems is multiplicative, not additive. If you have three servers that each run with 99% uptime, the chance that all three are simultaneously available isn’t 99%. It’s roughly 97%. Add a fourth server into a chain where a... Read more ›
The Musk Duck For starters it's a really cool duck. The Musk Duck or Biziura lobata, is a large aquatic duck found across southern Australia. They get their... Read more ›
Probabilistic cardinality estimators (HyperLogLog), similarity sketches (MinHash), and frequency estimators (Count-Min Sketch) are fundamental approximate data structures that each target one primary problem. We present DynamicDemiLog (DDL), a sketch that unifies cardinality estimation, set similarity, containment, element frequency and composition in one tiny data structure built from a single pass over the input stream. Using an inverted index over 200,687 RefSeq sketches (159,567 organisms... Read more ›
A short learning path from a weekend project: I indexed my personal markdown notes (~800 chunks), tried a few local embedding models, stored the same vectors in four different backends, and wired up simple RAG. Not a production guide — just the basics, with honest results from a corpus small enough to reason about. The idea, without the jargon pile Keyword search looks for shared words. Vector search converts text into a list of numbers (an embedding), treats that list as a point in space, an... Read more ›
I’m building a B+Tree as part of a database project, style but in Rust, inspired by QuillSQL and the Bustub by CMU. This post was a set of… Read more ›
Retries are one of the most widely adopted resilience patterns in distributed systems. Read more ›
Neuralwatt Cloud is the first AI inference service with energy-based pricing. Run inference with real visibility into power, cost, and efficiency. Use as a hosted service or deploy on your own infrastructure with Neuralwatt Deploy. Read more ›
Connect PostgreSQL and run SQL with built-in AI operators through samtSQL. Read more ›
Vespa implements several useful features for customizing and improving Vector Search. Here, we will go into detail of each of them. The post appeared first on <a href=" Read more ›
A grounded walk through Lume's retrieval core - field-aware BM25, two-stage roaring/Godel pruning, local GTR-T5 vectors via Shivvr, a significance-scored entity graph, the multiplicative blend that fuses them, and the knobs that tune it all. Read more ›
* * SCION: a next-generation inter-domain routing architecture * View page source --- # SCION: a next-generation inter-domain routing architecture SCION (Scalability, Control, and Isolation On Next-generation networks) is a secure and reliable inter-domain routing protocol, designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. ## Technology The ideas and concepts behind SCION. * **Overview**: SCION | Control Plane | Data P... Read more ›
All of your storage should not live in the same room. Read more ›
Letheo - Cognitive Runtime: agent memory engine (Rust + Python) - Abick91/letheo Read more ›
There’s a new paper out called “PivCo-Huffman” (HTML version with annotations here) and it’s very interesting. Normal Huffman decoding (and, to a lesser extent, encoding) is inherently quite serial. We can get explicit parallelism by using multiple streams, which scales just fine to moderate numbers of streams – something like 4-8 is usually not an […] Read more ›
Microsoft has introduced new tools and guidance to help organizations migrate large-scale data estates to Azure Storage more efficiently. The process begins with Azure Migrate, a centralized hub that discovers infrastructure, assesses readiness, and analyzes workload dependencies across on-premises and multicloud environments. To bridge the gap between planning and execution, a new AI-powered Azure Copilot Migration Agent is now available in preview to recommend specific storage services and ... Read more ›
Multi-agent LLM systems -- coding agents, devops agents, document agents -- now routinely run several agents in parallel against the same git tree, Kubernetes cluster, or document. As soon as two of them mutate shared state, they enter the regime classical concurrency control has studied for decades, but classical mechanisms fit LLM agents poorly. A single agent transaction spans minutes of inference, read sets are broad and opaque rather than s... Read more ›
Download our free template to create a service-level agreement with the performance and response time requirements that disaster recovery plans demand. Read more ›
Pierre Zemb is a staff engineer at Clever Cloud where he's building data layers API-compatible with services like Redis, PostgreSQL, and etcd on top of FoundationDB. Read more ›