Data Products

Data Products by Chad Sanderson focuses on the development and management of data products, emphasizing modern data modeling, data quality, and the importance of data contracts. It tackles challenges in data pipelines, the shift from traditional data modeling due to Agile and engineering priorities, and the critical role of collaboration and data governance in improving data quality and overcoming data debt within organizations.

Data Product Development Modern Data Modeling Data Quality Data Contracts and Governance Collaboration in Data Management Data Pipelines and Architecture Impact of Agile and Engineering on Data Practices Data Debt and Its Management

The hottest Substack posts of Data Products

And their main takeaways
2 implied HN points 27 Feb 24
  1. Chad Sanderson announced an upcoming book on Data Contracts with O'Reilly, covering topics like what data contracts are, how they work, implementation, examples, and the future implications. The book will delve into Data Quality and Governance.
  2. The first two chapters of the book are available for free on the O'Reilly website. They cover the importance of data contracts and the real goals of data quality initiatives, totaling about 45 pages of content.
  3. Chad Sanderson is currently selecting technical reviewers for the book. Interested individuals can reach out to him to share their thoughts on an advance copy.
5 implied HN points 08 Jan 24
  1. Data quality is crucial for machine learning projects and can have negative impacts on both society and individuals.
  2. Advances in Generative AI highlight the importance of high-quality data and the potential shortage of such data.
  3. Data quality affects the machine learning product development cycle, including ongoing maintenance costs of ML pipelines.
3 implied HN points 04 Dec 23
  1. Producers need to move towards consumer-defined data contracts to improve data quality and alignment with user needs.
  2. A phased approach of awareness, collaboration, and contract ownership helps in successful data contract adoption.
  3. Starting with consumer-defined contracts drives communication, awareness, and problem visibility, leading to long-term benefits.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
13 implied HN points 22 Aug 22
  1. Data Contracts are like API agreements for data.
  2. Garbage In, Garbage Out is a common challenge in data pipelines.
  3. Using Data Contracts can help improve trust and quality of data in production.
11 implied HN points 06 Jun 22
  1. Data modeling is valuable for designing data structure and relationships, bridging data and real world, and enabling easy exploration by data consumers.
  2. In the era of the Modern Data Stack, there is a trend of moving away from robust data modeling, leading to data debt, slow insights, and data swamp.
  3. Factors such as Agile methodologies, engineering-led organizations, and implementation friction contribute to the decline of traditional data modeling, emphasizing the need for a new approach that is Agile, collaborative, and low in implementation friction.
2 HN points 23 Jun 23
  1. The difference between OLTP and OLAP systems can cause miscommunication among data producers and consumers.
  2. OLTP systems focus on serving end users quickly with specific product features, while OLAP systems handle complex analytics by scanning large amounts of data.
  3. Empathy and communication between OLTP and OLAP teams are crucial to building scalable data products.
7 implied HN points 30 May 22
  1. Data quality is a big problem in data-driven organizations, including the modern data stack.
  2. Challenges arise from coupling production services with analytics and downstream data transformations.
  3. Emphasizing a cultural shift to treat data as a product can help resolve data quality issues.
5 implied HN points 19 Sep 22
  1. Non-consensual APIs can lead to data quality issues
  2. Prototype pipelines are useful for exploration but may not be reliable for external consumers
  3. Production-grade pipelines are crucial for cases where data quality impacts ROI
7 implied HN points 13 May 22
  1. Chad Sanderson leads the Data Platform team at Convoy, focusing on rebuilding a framework for modern data modeling.
  2. The newsletter covers philosophical musings around data, architecture, governance, semantics, and data APIs.
  3. Readers can expect deep insights into Convoy's work, including architecture designs, UX, and videos.
5 implied HN points 13 Jun 22
  1. Collaborative design is crucial in the modern data stack to prevent scalability issues
  2. Data modeling and thoughtful design are essential for a successful data warehouse
  3. Collaboration among stakeholders, iterative modeling, and applying product thinking can address key challenges in the modern data stack
1 HN point 07 Jul 23
  1. Data requires a source of truth that microservices cannot inherently provide without a shift in software engineering practices
  2. Not all data is equally valuable, so treating all data as microservices can be costly and restrictive
  3. The data development lifecycle differs from software development, requiring flexibility, reuse, and tight coupling that conflict with typical microservices architecture