Data Products

Data Products by Chad Sanderson focuses on the development and management of data products, emphasizing modern data modeling, data quality, and the importance of data contracts. It tackles challenges in data pipelines, the shift from traditional data modeling due to Agile and engineering priorities, and the critical role of collaboration and data governance in improving data quality and overcoming data debt within organizations.

Data Product Development Modern Data Modeling Data Quality Data Contracts and Governance Collaboration in Data Management Data Pipelines and Architecture Impact of Agile and Engineering on Data Practices Data Debt and Its Management

The hottest Substack posts of Data Products

And their main takeaways

The Data-Conscious Software Engineer

3 implied HN points • 28 Jan 25

Data teams need to learn best practices from software engineering, but that's not enough. They also need engineers who understand how data works and can work well with them.
Collaboration between data teams and software engineers is really important for success. If they don't communicate well, they can struggle to implement necessary changes and solve issues together.
The idea of a 'data-conscious software engineer' is becoming essential. These engineers understand the value of data and can help improve how both teams work together, making both sides more efficient.

Tackling Data's Biggest Culture Problem

8 implied HN points • 12 Sep 23

🕹 Technology Data Management Data Quality Communication Collaboration Change Management

Modern software teams often treat data as a by-product of their application
Data teams take dependencies on upstream sources without a clear contract
Data contract is the solution to bring data producers and data consumers together for managing constraints on data assets

Why Data Quality Is More Important Than Ever in an AI-Driven World

5 implied HN points • 08 Jan 24

🕹 Technology Data Quality Artificial Intelligence Machine Learning Data Engineering Generative AI

Data quality is crucial for machine learning projects and can have negative impacts on both society and individuals.
Advances in Generative AI highlight the importance of high-quality data and the potential shortage of such data.
Data quality affects the machine learning product development cycle, including ongoing maintenance costs of ML pipelines.

The True Cost of Data Debt

5 implied HN points • 11 Oct 23

🕹 Technology Data Management Data Quality Data science Cloud Computing Data Governance

Data should be seen as an asset, not just a resource.
Data debt can lead to serious consequences like trust issues and organizational chaos.
Data developers need to focus on data quality tools like data contracts to prevent and manage data debt.

The Rise of Data Contracts

13 implied HN points • 22 Aug 22

Data Contracts are like API agreements for data.
Garbage In, Garbage Out is a common challenge in data pipelines.
Using Data Contracts can help improve trust and quality of data in production.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Data Quality Resolution Process

3 implied HN points • 11 Dec 23

🕹 Technology Data Quality Data Engineering Data Analysis

Stakeholders surface data quality issues, managers must balance responsiveness without burnout.
Prioritize issues by determining urgency, impact, and potential solutions.
Communicate clearly with technical stakeholders, implement fixes cautiously, and maintain trust through thorough communication.

The Consumer-Defined Data Contract

3 implied HN points • 04 Dec 23

🕹 Technology Data Contracts Data Engineering Data Governance Data Quality Software Engineering

Producers need to move towards consumer-defined data contracts to improve data quality and alignment with user needs.
A phased approach of awareness, collaboration, and contract ownership helps in successful data contract adoption.
Starting with consumer-defined contracts drives communication, awareness, and problem visibility, leading to long-term benefits.

Data Contracts Book - Early Release

2 implied HN points • 27 Feb 24

🕹 Technology Data products Data Contracts Data Engineering Data Quality Data Governance

Chad Sanderson announced an upcoming book on Data Contracts with O'Reilly, covering topics like what data contracts are, how they work, implementation, examples, and the future implications. The book will delve into Data Quality and Governance.
The first two chapters of the book are available for free on the O'Reilly website. They cover the importance of data contracts and the real goals of data quality initiatives, totaling about 45 pages of content.
Chad Sanderson is currently selecting technical reviewers for the book. Interested individuals can reach out to him to share their thoughts on an advance copy.

The Death of Data Modeling - Pt. 1

11 implied HN points • 06 Jun 22

Data modeling is valuable for designing data structure and relationships, bridging data and real world, and enabling easy exploration by data consumers.
In the era of the Modern Data Stack, there is a trend of moving away from robust data modeling, leading to data debt, slow insights, and data swamp.
Factors such as Agile methodologies, engineering-led organizations, and implementation friction contribute to the decline of traditional data modeling, emphasizing the need for a new approach that is Agile, collaborative, and low in implementation friction.

The Existential Threat of Data Quality

7 implied HN points • 30 May 22

Data quality is a big problem in data-driven organizations, including the modern data stack.
Challenges arise from coupling production services with analytics and downstream data transformations.
Emphasizing a cultural shift to treat data as a product can help resolve data quality issues.

Welcome to Data Products

7 implied HN points • 13 May 22

Chad Sanderson leads the Data Platform team at Convoy, focusing on rebuilding a framework for modern data modeling.
The newsletter covers philosophical musings around data, architecture, governance, semantics, and data APIs.
Readers can expect deep insights into Convoy's work, including architecture designs, UX, and videos.

The Production-Grade Data Pipeline

5 implied HN points • 19 Sep 22

Non-consensual APIs can lead to data quality issues
Prototype pipelines are useful for exploration but may not be reliable for external consumers
Production-grade pipelines are crucial for cases where data quality impacts ROI

Data's Collaboration Problem

5 implied HN points • 13 Jun 22

Collaborative design is crucial in the modern data stack to prevent scalability issues
Data modeling and thoughtful design are essential for a successful data warehouse
Collaboration among stakeholders, iterative modeling, and applying product thinking can address key challenges in the modern data stack

OLTP vs OLAP: The Core of Data Miscommunication

2 HN points • 23 Jun 23

🕹 Technology Data Infrastructure Data science Data Systems Data Modeling Data architecture

The difference between OLTP and OLAP systems can cause miscommunication among data producers and consumers.
OLTP systems focus on serving end users quickly with specific product features, while OLAP systems handle complex analytics by scanning large amounts of data.
Empathy and communication between OLTP and OLAP teams are crucial to building scalable data products.

Data is not a Microservice

1 HN point • 07 Jul 23

🕹 Technology Data Engineering Software Architecture Data Management Microservices

Data requires a source of truth that microservices cannot inherently provide without a shift in software engineering practices
Not all data is equally valuable, so treating all data as microservices can be costly and restrictive
The data development lifecycle differs from software development, requiring flexibility, reuse, and tight coupling that conflict with typical microservices architecture