Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Data Engineering
🏗️ Data Engineering
data pipelines, ETL, Apache Spark, data lakes
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
468
posts in
5.2
ms
SDLC vs. AIDLC: Why
Data
Engineering
is Pushing the Boundaries of Software Development
⚙️
ML Infra
Content type:
Blog
medium.com
·
5d
5 days ago
Actions for SDLC vs. AIDLC: Why Data Engineering is Pushing the Boundaries of Software Development
Exclusive: MotherDuck adds agentic
data
ingestion
to its cloud analytics service
🔄
MLOps
siliconangle.com
·
11h
11 hours ago
Actions for Exclusive: MotherDuck adds agentic data ingestion to its cloud analytics service
Run an
Apache
Airflow
DAG with Docker Compose and PostgreSQL
☸️
K8S
pyimagesearch.com
·
2d
2 days ago
Actions for Run an Apache Airflow DAG with Docker Compose and PostgreSQL
Deep dive: How Lightning
Engine
delivers 4.9x faster
Apache
Spark
performance
🏛️
Software Architecture
Content type:
Blog
cloud.google.com
·
6h
6 hours ago
Actions for Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance
Calculating speed estimates with
Apache
Spark
🔄
MLOps
Content type:
Blog
mapbox.com
·
2d
2 days ago
Actions for Calculating speed estimates with Apache Spark
Do
data
quality frameworks have to be so complex?
🔄
MLOps
sparkdq-community.github.io
·
6d
6 days ago
·
r/Python
Actions for Do data quality frameworks have to be so complex?
Daily Deal: The 2026
Data
Engineering
Bundle featuring Databricks
⚙️
ML Infra
techdirt.com
·
6h
6 hours ago
Actions for Daily Deal: The 2026 Data Engineering Bundle featuring Databricks
What Went Wrong with
Data
Lakes
? A 15-Year Reality Check from the Field
🕸️
Distributed Systems
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for What Went Wrong with Data Lakes? A 15-Year Reality Check from the Field
Deploying Vector High-Performance Observability
Data
Pipeline
on Ubuntu 24.04
☸️
K8S
Content type:
Reference
Content type:
Tutorial
docs.vultr.com
·
2h
2 hours ago
·
DEV
Actions for Deploying Vector High-Performance Observability Data Pipeline on Ubuntu 24.04
Designing an
ETL
Application: Why I Started with a Modular Monolith Before Microservices
🏛️
Software Architecture
Content type:
Blog
medium.com
·
15h
15 hours ago
Actions for Designing an ETL Application: Why I Started with a Modular Monolith Before Microservices
Announcing general availability of
Apache
Spark
4.0 on Amazon EMR
⚙️
ML Infra
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Announcing general availability of Apache Spark 4.0 on Amazon EMR
Senior
Data
Engineer
– Climate Friendly
🏛️
Software Architecture
au.seek.com
·
5d
5 days ago
·
Hacker News
,
Hacker News
Actions for Senior Data Engineer – Climate Friendly
Linux Fundamentals for
Data
Engineering
🔄
MLOps
dev-to-uploads.s3.amazonaws.com
·
2d
2 days ago
·
DEV
Actions for Linux Fundamentals for Data Engineering
Thermalright Introduces Peerless Assassin SE Series CPU Coolers
🕸️
Distributed Systems
techpowerup.com
·
10h
10 hours ago
Actions for Thermalright Introduces Peerless Assassin SE Series CPU Coolers
Introducing Streamling: Performant and Extensible
Data
Streaming Framework
🏛️
Software Architecture
Content type:
News
streamingdata.tech
·
1d
1 day ago
Actions for Introducing Streamling: Performant and Extensible Data Streaming Framework
Automating Real-time
Data
Pipelines
: Deploying Pub/Sub to BigQuery with Dataflow Custom Template…
🔄
MLOps
Content type:
Blog
medium.com
·
6d
6 days ago
Actions for Automating Real-time Data Pipelines: Deploying Pub/Sub to BigQuery with Dataflow Custom Template…
Exempt a specific container in MDC
☸️
K8S
techcommunity.microsoft.com
·
1d
1 day ago
Actions for Exempt a specific container in MDC
Ionic solid-state cooling from Ventiva: when cooling in compact
AI
systems becomes an architectural question
🕸️
Distributed Systems
igorslab.de
·
15h
15 hours ago
Actions for Ionic solid-state cooling from Ventiva: when cooling in compact AI systems becomes an architectural question
benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers,
dbt
+
Airflow
recipes.
🐍
Python
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.
Nike’s Coolest Running Innovation in Years Is a Lot Bigger Than the Brand Initially Let On
🔄
MLOps
gearpatrol.com
·
2d
2 days ago
Actions for Nike’s Coolest Running Innovation in Years Is a Lot Bigger Than the Brand Initially Let On
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help