Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Data Pipelines
🔄 Data Pipelines
Specific
Apache Kafka, Airflow, dbt, streaming data
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
106
posts in
5.9
ms
How to Build
Data
Pipelines
That Resist Partition Drift
🏗️
Data Platforms
hackernoon.com
·
1d
1 day ago
Actions for How to Build Data Pipelines That Resist Partition Drift
Senior
Data
Engineer – Climate Friendly
🔧
Data Engineering
au.seek.com
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for Senior Data Engineer – Climate Friendly
Deploying Vector High-Performance Observability
Data
Pipeline
on Ubuntu 24.04
🚀
DevOps
Content type:
Reference
Content type:
Tutorial
docs.vultr.com
·
8h
8 hours ago
·
DEV
Actions for Deploying Vector High-Performance Observability Data Pipeline on Ubuntu 24.04
DuckLake Spec, pg_background 2.0, and pgsql_tweaks 1.0.3 Enhance
Database
Ecosystem
💾
Databases
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for DuckLake Spec, pg_background 2.0, and pgsql_tweaks 1.0.3 Enhance Database Ecosystem
Less-relevant results
Choosing the right workflow orchestration service for your use case: Amazon MWAA and AWS Step Functions
☁️
Cloud Computing
Content type:
Blog
aws.amazon.com
·
15h
15 hours ago
Actions for Choosing the right workflow orchestration service for your use case: Amazon MWAA and AWS Step Functions
Snowflake
Datastream
:
Kafka-native
streaming
in Snowflake
🌊
Stream Processing
snowflake.com
·
6d
6 days ago
·
Hacker News
Actions for Snowflake Datastream: Kafka-native streaming in Snowflake
Apache
Kafka
Explained: A Practical Beginner Guide for
Data
Engineers
🌊
Stream Processing
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for Apache Kafka Explained: A Practical Beginner Guide for Data Engineers
benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers,
dbt
+
Airflow
recipes.
💾
Databases
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.
Deep dive: How Lightning Engine delivers 4.9x faster
Apache
Spark performance
🌟
spark
Content type:
Blog
cloud.google.com
·
12h
12 hours ago
Actions for Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance
Your RAG System Might Be Confidently Wrong
🔧
Data Engineering
hackernoon.com
·
2d
2 days ago
Actions for Your RAG System Might Be Confidently Wrong
Gene dependency-informed inference of response to targeted cancer therapies
🔧
Data Engineering
Content type:
Academic
nature.com
·
3d
3 days ago
Actions for Gene dependency-informed inference of response to targeted cancer therapies
Modern
Data
Stack Migration — Day 1: Scaling to 8+ Companies with DRY Architecture and Chasing a $2M Discrepancy
🏗️
Data Platforms
Content type:
Blog
dev.to
·
18h
18 hours ago
·
DEV
Actions for Modern Data Stack Migration — Day 1: Scaling to 8+ Companies with DRY Architecture and Chasing a $2M Discrepancy
Spotify is licensing live concert video and reserving tickets for superfans
🌊
Stream Processing
Content type:
News
thenextweb.com
·
2d
2 days ago
Actions for Spotify is licensing live concert video and reserving tickets for superfans
The Considerate
Data
Modeler
💾
Databases
oranlooney.com
·
6d
6 days ago
·
Hacker News
Actions for The Considerate Data Modeler
Day 10 of 100 Days of ClickHouse®: What Makes ClickHouse SQL Different?
📊
Column Stores
quantrail-data.com
·
1d
1 day ago
·
DEV
Actions for Day 10 of 100 Days of ClickHouse®: What Makes ClickHouse SQL Different?
Building a Lean, Single-Worker Broken URL Monitor for
Data
Pipelines
🔧
Data Engineering
Content type:
Blog
dev.to
·
13h
13 hours ago
·
DEV
Actions for Building a Lean, Single-Worker Broken URL Monitor for Data Pipelines
I designed a 0.9B Mamba-2 / GLA hybrid LLM — the
AI
agents wrote the code. An honest build log.
🤖
AI
Content type:
Code
github.com
·
6d
6 days ago
·
DEV
Actions for I designed a 0.9B Mamba-2 / GLA hybrid LLM — the AI agents wrote the code. An honest build log.
Why the Modern
Data
Stack Trapped
Data
Engineers in Tools
🏗️
Data Platforms
hackernoon.com
·
2d
2 days ago
Actions for Why the Modern Data Stack Trapped Data Engineers in Tools
Microsoft allows BYOL for Amazon RDS. Repeat, Microsoft allows BYOL for Amazon RDS
☁️
AWS Infrastructure
Content type:
News
theregister.com
·
5d
5 days ago
Actions for Microsoft allows BYOL for Amazon RDS. Repeat, Microsoft allows BYOL for Amazon RDS
From
Data
Quality Checks to Analytics-Ready Parquet with Python
📋
CSV Processing
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for From Data Quality Checks to Analytics-Ready Parquet with Python
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help