Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Data Engineering
🏗️ Data Engineering
data pipelines, ETL, Apache Spark, data lakes
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
478
posts in
8.1
ms
Ionic solid-state cooling from Ventiva: when cooling in compact
AI
systems becomes an architectural question
🕸️
Distributed Systems
igorslab.de
·
13h
13 hours ago
Actions for Ionic solid-state cooling from Ventiva: when cooling in compact AI systems becomes an architectural question
benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers,
dbt
+
Airflow
recipes.
🐍
Python
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for benseverndev-oss/goldenmatch: Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.
My First PyCon US
🐍
Python
Content type:
Blog
bhavaniravi.com
·
59m
59 minutes ago
Actions for My First PyCon US
Claude Code for Research: Preventing Hallucinations
⏱️
Productivity
Content type:
News
Content type:
Blog
homeeconomics.substack.com
·
2d
2 days ago
·
Substack
Actions for Claude Code for Research: Preventing Hallucinations
Embedding
pipelines
are the new
ETL
⚙️
ML Infra
Content type:
Blog
infoworld.com
·
5d
5 days ago
Actions for Embedding pipelines are the new ETL
Gene dependency-informed inference of response to targeted cancer therapies
🔄
MLOps
Content type:
Academic
nature.com
·
2d
2 days ago
Actions for Gene dependency-informed inference of response to targeted cancer therapies
CSU Student of Distinction: Anthony Arthur
📈
Career Growth
Content type:
Academic
csuohio.edu
·
12h
12 hours ago
Actions for CSU Student of Distinction: Anthony Arthur
New comment by mkolarek in "Ask HN: Who wants to be hired? (June 2026)"
🐍
Python
Content type:
PDF
markokolarek.com
·
4d
4 days ago
·
Hacker News
Actions for New comment by mkolarek in "Ask HN: Who wants to be hired? (June 2026)"
Azerbaijani Central Bank set to adopt
data
Lakehouse
system in 2026
🏛️
Software Architecture
trend.az
·
1d
1 day ago
Actions for Azerbaijani Central Bank set to adopt data Lakehouse system in 2026
Real-time
data
replication to your
data
warehouse
, self-serve
🏛️
Software Architecture
artie.com
·
1d
1 day ago
·
Hacker News
,
Hacker News
Actions for Real-time data replication to your data warehouse, self-serve
New
Airflow
Previews a New Design Language for Chrysler
⚙️
ML Infra
hagerty.com
·
5d
5 days ago
Actions for New Airflow Previews a New Design Language for Chrysler
Streaming and Batch
Data
Architectures with Microsoft Fabric to Azure Databricks
🏛️
Software Architecture
techcommunity.microsoft.com
·
1d
1 day ago
Actions for Streaming and Batch Data Architectures with Microsoft Fabric to Azure Databricks
Enhancements to Managed Service for
Apache
Spark
clusters
🏛️
Software Architecture
Content type:
Blog
cloud.google.com
·
6d
6 days ago
Actions for Enhancements to Managed Service for Apache Spark clusters
(
PR
) Rosewill Launches the FBM-Z Series Micro-ATX Cases
🕸️
Distributed Systems
techpowerup.com
·
14h
14 hours ago
Actions for (PR) Rosewill Launches the FBM-Z Series Micro-ATX Cases
I built a NAS with enterprise SAS drives, and the hidden costs nearly matched new SATA drives
🕸️
Distributed Systems
xda-developers.com
·
12h
12 hours ago
Actions for I built a NAS with enterprise SAS drives, and the hidden costs nearly matched new SATA drives
Announcing
Spark
Connect on Amazon EMR Serverless: Interactive
PySpark
development, anywhere
🏛️
Software Architecture
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Announcing Spark Connect on Amazon EMR Serverless: Interactive PySpark development, anywhere
This Is the Sub-$40,000 SUV That’s Supposed to Save Chrysler
⏱️
Productivity
thedrive.com
·
6d
6 days ago
Actions for This Is the Sub-$40,000 SUV That’s Supposed to Save Chrysler
Location: Lubbock, TX, USA Remote: Yes (Remote-friendly, US-based) Technologies:...
🐍
Python
Content type:
Discussion
news.ycombinator.com
·
8h
8 hours ago
·
Hacker News
Actions for Location: Lubbock, TX, USA Remote: Yes (Remote-friendly, US-based) Technologies:...
DuckDB Storage
Engine
for MariaDB. When the Sea Lion Learns to Quack.
🕸️
Distributed Systems
mariadb.org
·
1d
1 day ago
·
Hacker News
Actions for DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.
Day 10 of 100 Days of ClickHouse®: What Makes ClickHouse SQL Different?
⏱️
Productivity
quantrail-data.com
·
18h
18 hours ago
·
DEV
Actions for Day 10 of 100 Days of ClickHouse®: What Makes ClickHouse SQL Different?
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help