🔄 Data Pipelines - widget101 · Scour

How to Build Data Pipelines That Resist Partition Drift

🏗️Data Platforms

hackernoon.com·

Senior Data Engineer – Climate Friendly

🔧Data Engineering

au.seek.com··Hacker News, Hacker News

Deploying Vector High-Performance Observability Data Pipeline on Ubuntu 24.04

🚀DevOps Reference Tutorial

docs.vultr.com··DEV

DuckLake Spec, pg_background 2.0, and pgsql_tweaks 1.0.3 Enhance Database Ecosystem

💾Databases Blog

Less-relevant results

Choosing the right workflow orchestration service for your use case: Amazon MWAA and AWS Step Functions

☁️Cloud Computing Blog

aws.amazon.com·

Snowflake Datastream: Kafka-native streaming in Snowflake

🌊Stream Processing

snowflake.com··Hacker News

Apache Kafka Explained: A Practical Beginner Guide for Data Engineers

🌊Stream Processing Blog

benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.

💾Databases Code

github.com··Hacker News

Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance

🌟spark Blog

cloud.google.com·

Your RAG System Might Be Confidently Wrong

🔧Data Engineering

hackernoon.com·

Gene dependency-informed inference of response to targeted cancer therapies

🔧Data Engineering Academic

Modern Data Stack Migration — Day 1: Scaling to 8+ Companies with DRY Architecture and Chasing a $2M Discrepancy

🏗️Data Platforms Blog

Spotify is licensing live concert video and reserving tickets for superfans

🌊Stream Processing News

thenextweb.com·

The Considerate Data Modeler

oranlooney.com··Hacker News

Day 10 of 100 Days of ClickHouse®: What Makes ClickHouse SQL Different?

📊Column Stores

quantrail-data.com··DEV

Building a Lean, Single-Worker Broken URL Monitor for Data Pipelines

🔧Data Engineering Blog

I designed a 0.9B Mamba-2 / GLA hybrid LLM — the AI agents wrote the code. An honest build log.

🤖AI Code

github.com··DEV

Why the Modern Data Stack Trapped Data Engineers in Tools

🏗️Data Platforms

hackernoon.com·

Microsoft allows BYOL for Amazon RDS. Repeat, Microsoft allows BYOL for Amazon RDS

☁️AWS Infrastructure News

theregister.com·

From Data Quality Checks to Analytics-Ready Parquet with Python

📋CSV Processing Blog

Log in to enable infinite scrolling