We Lost the Thread on the Data Lake
blog.matterbeam.com·2h·
Discuss: Hacker News
💾Persistent Heaps
Preview
Report Post

In 2014, my last startup was acquired. We joined a fast growing organization with a top-notch data team. They had invested heavily in data infrastructure. Data was strategic. They had "the hub," a Hadoop cluster built on HDFS. I thought: here’s a company doing things right.

Then I tried to analyze monthly active users. I found no less than 20 paths containing some version of MAU or "monthly active user" in the name. Some were kilobytes, many were multi-megabytes. All had different shapes and schemas when introspected. Dates were inconsistent or downright lying. Who created them? How were they computed? Which ones were the actual metrics being used by the business?

After some digital archeology, I …

Similar Posts

Loading similar posts...