The hottest Data Modeling Substack posts right now

And their main takeaways

Relational modeling challenge

Minimal Modeling • 811 implied HN points • 02 Feb 25

A key goal in data modeling is to make sure invalid data states cannot be created. This means designing systems where incorrect data combinations are impossible.
The challenge presented involves creating a way to track daily coffee consumption while preventing contradictory data entries, like recording that a user both had coffee and was coffee-free on the same day.
Using common database features, the task is to develop a solution that complies with standard relational model rules, avoiding the use of tricks like JSON data types or triggers.

My Definition of Data Modeling (for today)

Joe Reis • 530 implied HN points • 20 Jan 24

🕹 Technology Data Modeling AI Venture Capital Data Management

Data modeling has various definitions by different experts and serves to improve communication, provide utility, and solve problems.
A data model is a structured representation that organizes data for both humans and machines to inform decision-making and facilitate actions.
Data modeling is evolving to consider the needs of machines, different use cases, and a wider range of modeling approaches for various situations.

How to deal with non-i.i.d data in machine learning

Mindful Modeler • 479 implied HN points • 09 Jan 24

🕹 Technology Machine Learning Data Modeling Data interpretation Model Evaluation

Dealing with non-i.i.d data in machine learning can prevent data leakage, overfitting, and overly optimistic performance evaluation.
For modeling data with dependencies, classical statistical approaches like mixed effect models can be used to correctly estimate coefficients.
In non-i.i.d. data situations, the data splitting setup must align with the real-world use case of the model to avoid issues like row-wise leakage and over-optimistic model performance.

Is Kimball Still Relevant?

Joe Reis • 648 implied HN points • 22 Jul 23

🕹 Technology Data Modeling Open Source Hardware

There are abundant tools and computing power available, but focusing on delivering business value with data is still crucial.
Data modeling, like Kimball's dimensional model, remains relevant for effective analytics despite advancements in technology.
Ignoring data modeling in favor of performance considerations can lead to a loss of understanding, business value, and overall impact.

Data Warehousing Essentials: A Precursor

SeattleDataGuy’s Newsletter • 1154 implied HN points • 16 Jan 24

🕹 Technology Data Warehousing Data Modeling

Data warehousing aims to make data accessible for informed decision-making.
Dimension tables provide context to quantitative data in fact tables.
Bridge tables manage many-to-many relationships between dimensions.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Never mind the quality, feel the speed

imperfect offerings • 239 implied HN points • 02 Feb 24

🚌 Education Research AI Data Modeling Social Sciences

The research economy is increasingly focused on speed over quality, especially with the rise of generative AI, which can have negative impacts on reproducibility and diverse fields of knowledge.
Data models in research need to be carefully scrutinized for accuracy and not blindly relied upon, even in specialized areas like protein folding, climate science, or medical diagnostics.
Speed and heuristics shouldn't overshadow the importance of deliberation, qualitative research, and embracing complexity in arriving at meaningful solutions to multidimensional problems.

Modellion

davidj.substack • 179 implied HN points • 25 Nov 24

🕹 Technology Data architecture Big Data Data Modeling Database Management Cloud Computing

Medallion architecture is not just about data modeling but represents a high-level structure for organizing data processes. It helps in visualizing data flow in a project.
The architecture has three main layers: Bronze deals with cleaning and preparing data, Silver creates a structured data model, and Gold is about making data easy to access and use.
The terms Bronze, Silver, and Gold may sound appealing to non-technical users but could be more accurately described. Renaming these layers could better reflect their actual roles in data handling.

Why product analytics is completely different than BI

timo's substack • 294 implied HN points • 28 Feb 23

🕹 Technology Data Analytics Product Management Business Intelligence Data Collection Data Modeling

Marketing analytics, BI, and product analytics have different requirements for source data and data handling.
Product analytics involves more exploration and pattern-finding compared to marketing analytics and BI.
Adopting product analytics requires a different approach, mindset, and tool compared to traditional analytics setups.

Normalization Vs Denormalization - Taking A Step Back

SeattleDataGuy’s Newsletter • 612 implied HN points • 21 Nov 23

🕹 Technology Data Engineering Database Data Modeling Normalization

Normalization structures data to reduce duplication and ensure integrity.
Goals of normalization include eliminating redundancy, minimizing data mutation issues, and protecting data integrity.
Denormalization introduces redundancy strategically to improve read performance, useful for reporting, analytics, and read-heavy applications.

Data Modeling 101 - Part 2

Data Engineering Central • 255 implied HN points • 10 Jul 23

🕹 Technology Data Modeling Indexes

Data Modeling involves distinct approaches for relational databases and Lake Houses.
Key concepts like logical normalization, business use case analysis, and physical data localization are crucial for effective data modeling.
Understanding the 'grain' of the data, or the lowest level of detail in a record, is essential for a successful data model.

Joe's Nerdy Rants #9

Joe Reis • 255 implied HN points • 15 Jul 23

🕹 Technology Data Modeling Artificial Intelligence Business Startups Software Engineering

Data modeling in the industry is often ignored or undervalued.
There is a trend towards 'query-driven modeling' or 'just-in-time modeling.'
The question is raised about the importance of data modeling and its impact on businesses.

You just bought Snowflake. What next? Your Top 5 Priorities

The Orchestra Data Leadership Newsletter • 59 implied HN points • 29 Apr 24

🕹 Technology Data Management Data Infrastructure Data Modeling Data Governance

Ensure rock-solid infrastructure for your Snowflake implementation to prevent pipeline failures and maintain data quality.
Set clear expectations and prioritize projects to manage scope and quality, fostering trust and collaboration.
Start thinking of data as a product during the Snowflake implementation to minimize costs, stabilize usage, and accelerate trust in the data team.

Eventify everything - Data modeling for event data

timo's substack • 117 implied HN points • 06 Feb 24

🕹 Technology Data Modeling Data Analytics

Data modeling for event data involves handling various source data and supporting diverse analysis use cases.
Event data modeling can be organized into layers, from raw source data to consumption-ready data for analytics tools.
Qualifying events to activities in event data modeling helps improve data usability and user experience in analytics tools.

Becoming A Better Data Engineer - Tips On Translating Business Requirements

SeattleDataGuy’s Newsletter • 671 implied HN points • 24 Aug 23

🕹 Technology Data Engineering Data Visualization Data Modeling

Understand the business requirements before building technical solutions.
Ask lots of questions to clarify what the business really needs.
Create visuals or prototypes to better communicate and iterate on business requirements.

Data modeling brain dump, vol. I

Minimal Modeling • 405 implied HN points • 12 Oct 23

🕹 Technology Data Modeling

The author has been sharing daily Twitter threads on data-related topics, which will continue as long as there is something to write about.
The author is considering crossposting these threads to Substack Notes and is seeking feedback on reader interest.
The author has found changing the cadence and format of their posts to be refreshing and an interesting experiment.

What is beyond event data?

timo's substack • 196 implied HN points • 18 Jul 23

🕹 Technology Data Analytics Data Modeling Activities

Activities are crucial for adding a business layer to event data.
An event must have meaning, defined by a name that describes the action triggered.
Implementing an activity layer helps in structuring and understanding event data in a more meaningful way.

THE WAY OF WAYS

group by 1 • 196 implied HN points • 18 Aug 23

🕹 Technology Data Analytics Data Modeling Business Intelligence Product Analytics Data Stack

The Modern Data Stack evolved but faced challenges of cost, complexity, and sprawl.
MDS led to more focus on product analytics and consolidation of data systems.
There is still a need for innovation in data modeling to address complexity and drive value.

Joe's Nerdy Rants #5

Joe Reis • 176 implied HN points • 17 Jun 23

🕹 Technology Data Modeling Artificial Intelligence Business Startups Programming

Data professionals interpret the concept of 'model' in various ways, leading to confusion and inconsistency in the field.
Establishing a shared understanding through high-level data modeling can promote consistent and reliable models in organizations.
The use of AI tools in programming has become widespread, indicating a shift in the nature of programming but emphasizing the importance of understanding and verifying AI-generated code.

sqlmesh model kinds - 1

davidj.substack • 59 implied HN points • 06 Dec 24

🕹 Technology Data Modeling Software Development APIs Machine Learning Databases

There are different types of models in sqlmesh, such as full, view, and embedded models, each having unique functions and uses. It's important to choose the right model type based on how fresh or how often you need the data.
SCD Type 2 models are useful for managing records that change over time, as they track the history of changes. This can make analyzing data trends much easier and faster.
External models in sqlmesh allow you to reference database objects not managed by your project. This can simplify data modeling and documentation, as they automatically gather useful metadata.

The scope of the data modeling book

Minimal Modeling • 202 implied HN points • 07 Sep 23

🕹 Technology Data Modeling Database Design

Data modeling involves layers like actual business domain, logical model, physical model, and database storage optimization.
Focus primarily on the logical model and how it maps to the physical model for practical advice on table structures.
Key areas within the scope of data modeling include basic logical model, handling either/or/or data, modeling polymorphic data, template repetitions, basic physical model, and secondary data.

[repeat] Upcoming talk: Incremental Documentation for Your Database, Feb 7, 2024

Minimal Modeling • 101 implied HN points • 24 Jan 24

🕹 Technology Documentation Database Data Modeling Online Event

The talk will discuss onboarding people to Minimal Modeling
The main idea of Minimal Modeling will be explored during the talk
The talk will cover processes enabled by the Minimal Modeling approach

Talk: Incremental Documentation for Your Database [2024-02-07]

Minimal Modeling • 101 implied HN points • 07 Dec 23

🕹 Technology Data Modeling Documentation Meetup Database Collaboration

Talk on Incremental Documentation for Databases happening on February 7, 2024.
Focus on using a lightweight tabular format for data catalog organization.
Benefits include reduced project onboarding time, better communication with stakeholders, and cost savings.

Design APIs around data, not databases

🔮 Crafting Tech Teams • 59 implied HN points • 05 Sep 23

🕹 Technology API design Data Modeling

Design APIs should be based on data, not just databases.
Consider concepts like coupling, database refactoring, and event modeling.
Focus on creating APIs that are centered around the data being used.

GroupBy #7: The rise of data engineer, levels of abstractions, data modeling

VuTrinh. • 39 implied HN points • 31 Oct 23

🕹 Technology Data Engineering Software Development Machine Learning Data Modeling Cloud Computing

Data engineers are becoming more important in the tech world as they handle vast amounts of data. Their role is focused on building systems that allow for efficient data handling and analysis.
Levels of abstraction in data engineering can be confusing, leading to challenges in understanding systems. It’s important to find a balance between using abstractions and being able to see the underlying processes.
Good data modeling practices can help organizations make better use of their time-series data. Understanding how to structure data effectively is key to unlocking its value.

Why Minimal Modeling has no 3-way links

Minimal Modeling • 101 implied HN points • 11 Jul 23

🕹 Technology Data Modeling Database Design Normalization Data Analysis SQL

In minimal modeling, links are defined with two anchors, not three.
Using two-way links can model examples effectively without the need for 3-way links.
Links in minimal modeling can introduce confusion when sentences in natural language aren't validated as actual links.

Standard Semantics

davidj.substack • 95 implied HN points • 01 Nov 23

🕹 Technology Data Modeling Semantic Layers APIs Standardization

Having a standard interface for semantic layers is crucial to prevent failure and ensure compatibility among different layers.
SQL APIs offered by semantic layers may not be truly SQL, leading to potential confusion and challenges in querying data.
Supporting REST HTTP interfaces for semantic layers enables a broader range of use cases, including data applications for internal and external purposes.

Evolution of Data

🔮 Crafting Tech Teams • 19 implied HN points • 12 Jul 23

🕹 Technology Data processing Data Modeling

The post discusses the evolution of data with a focus on concepts like MapReduce, Data Warehouses, and Lakes.
It mentions being inspired by the book 'Designing Data-Intensive Applications' by Martin Kleppmann and drawing parallels with modern data tools.
Readers are invited to subscribe to 'Crafting Tech Teams' for more content and a 7-day free trial.

Everyday examples of mundane low-stakes data modeling fails

Data: Made Not Found (by danah) • 51 implied HN points • 13 Jun 23

🕹 Technology Data Modeling Supply Chains Customer Service Delivery Services

Focusing on low-stakes data modeling failures is important to understand how algorithms are shaping minor aspects of our lives.
Supply chains and service-based businesses are facing challenges from flawed data modeling, affecting customers, workers, and businesses.
Everyday interactions like car rentals and food delivery are revealing flaws in data modeling, leading to frustration and distrust in brands.

March, Etc.

Data People Etc. • 53 implied HN points • 15 Mar 23

🕹 Technology Data Modeling Data Management Data Quality

Intermediate data modeling can be valuable following Kimball design principles.
Attending events like Data Council can provide insights and networking opportunities.
Engaging in ongoing discussions and being part of a community can enhance the writing and learning experience.

The Maximum Likelihood Estimation

The Palindrome • 4 implied HN points • 02 Jan 24

🔬 Science Machine Learning Probability Optimization Data Modeling

Optimizing the loss function by going against its gradient is a key concept in machine learning.
Efficiently computing the gradient and performing matrix operations are foundational for deep learning.
The maximum likelihood estimation is a key statistical method used to estimate parameters in probabilistic models.

Joe's Nerdy Rants #6

Joe Reis • 2 HN points • 24 Jun 23

🕹 Technology Data Modeling

Data modeling needs to adapt to modern business workflows and technologies.
There is a need to address the underlying issues in databases and data warehouses before implementing AI solutions.
Practices like conceptual and logical data modeling should be revitalized and made simpler and more iterative.

Normalizing User Data

ciamweekly • 2 HN points • 26 Feb 24

🕹 Technology Data Modeling Normalization Performance optimization

Data modeling involves the choice between normalizing data and using denormalized data, each with its own strengths and tradeoffs.
Normalized data leads to less data duplication and easier data updates, but may result in challenges with historical data and performance.
CIAM systems, along with IAM and directory systems, normalize user data to centralize customer information, providing benefits like easy querying and centralized authentication, but also introducing challenges like session handling and updating data across systems.

ChatGPT Learns Fintech

Chaos Engineering • 5 implied HN points • 24 Feb 23

🕹 Technology AI Fintech Machine Learning Neural Networks Data Modeling

ChatGPT can learn some superficial aspects of finance but needs explicit training to become a financial expert.
For ChatGPT to learn fintech, a hybrid architecture combining its pretrained model with a specific ML model optimized for financial tasks is necessary.
Improving ChatGPT's understanding of finance requires training it on structured financial data and updating its architecture to process dense, numeric data.

OLTP vs OLAP: The Core of Data Miscommunication

Data Products • 2 HN points • 23 Jun 23

🕹 Technology Data Infrastructure Data science Data Systems Data Modeling Data architecture

The difference between OLTP and OLAP systems can cause miscommunication among data producers and consumers.
OLTP systems focus on serving end users quickly with specific product features, while OLAP systems handle complex analytics by scanning large amounts of data.
Empathy and communication between OLTP and OLAP teams are crucial to building scalable data products.

Many explanations of JOIN are wrong, and people get confused

Minimal Modeling • 1 HN point • 25 Nov 23

🕹 Technology Databases SQL Data Modeling Algorithm Programming

Many common explanations of JOIN are incorrect and can lead to confusion.
The common explanations are mostly right only when using specific conditions like ID equality.
The generalized behavior of JOIN should have been a separate operator to avoid confusion and optimize performance.

Beyond YAML

Making Things • 1 implied HN point • 06 Nov 23

🕹 Technology Programming Languages Data Modeling Semantic Layers Feedback loops

Many semantic layers are built with YAML for its readability and quick setup, but it can lead to a poor developer experience.
YAML lacks immediate feedback for complex expressions, forcing users into a guessing game when writing configurations.
Implementing a real programming language instead of just a configuration DSL can provide instant feedback and support complex data modeling.

How to Design Metrics With Prometheus Metric Types

DevCube • 0 implied HN points • 14 Mar 23

🕹 Technology Monitoring Data Modeling

Prometheus has 4 main metric types: Counter, Gauge, Histogram, and Summary.
Counter is used for values that can only increase.
Gauge is used for values that can increase and decrease.

Querying a Semantic Data Model

Making Things • 0 implied HN points • 13 Nov 23

🕹 Technology Data Modeling Querying SQL

A semantic data model includes pre-built calculations and relationships.
There are two main types of queries: lookup and aggregating.
In a semantic data model, querying involves selecting dimensions and measures, simplifying the process.

A model of everything

Simplicity is SOTA • 0 implied HN points • 17 Jul 23

🕹 Technology Data Modeling Machine Learning Financial forecasting

A model of everything predicts final and intermediate goals of a company, is causal, and covers significant inputs.
Foundational choices in building a model of everything include deciding the scope, complexity of relationships, and optimization strategy.
Financial forecasting often involves models of everything, built in spreadsheets, but may not work well for machine learning models.

The hottest Data Modeling Substack posts right now

Minimal Modeling • 811 implied HN points • 02 Feb 25

Joe Reis • 530 implied HN points • 20 Jan 24

Mindful Modeler • 479 implied HN points • 09 Jan 24

Joe Reis • 648 implied HN points • 22 Jul 23

SeattleDataGuy’s Newsletter • 1154 implied HN points • 16 Jan 24

imperfect offerings • 239 implied HN points • 02 Feb 24

davidj.substack • 179 implied HN points • 25 Nov 24

timo's substack • 294 implied HN points • 28 Feb 23

SeattleDataGuy’s Newsletter • 612 implied HN points • 21 Nov 23

Data Engineering Central • 255 implied HN points • 10 Jul 23

Joe Reis • 255 implied HN points • 15 Jul 23

The Orchestra Data Leadership Newsletter • 59 implied HN points • 29 Apr 24

timo's substack • 117 implied HN points • 06 Feb 24

SeattleDataGuy’s Newsletter • 671 implied HN points • 24 Aug 23

Minimal Modeling • 405 implied HN points • 12 Oct 23

timo's substack • 196 implied HN points • 18 Jul 23

group by 1 • 196 implied HN points • 18 Aug 23

Joe Reis • 176 implied HN points • 17 Jun 23

davidj.substack • 59 implied HN points • 06 Dec 24

Minimal Modeling • 202 implied HN points • 07 Sep 23

Minimal Modeling • 101 implied HN points • 24 Jan 24

Minimal Modeling • 101 implied HN points • 07 Dec 23

🔮 Crafting Tech Teams • 59 implied HN points • 05 Sep 23

VuTrinh. • 39 implied HN points • 31 Oct 23

Minimal Modeling • 101 implied HN points • 24 Jul 23

Minimal Modeling • 101 implied HN points • 11 Jul 23

davidj.substack • 95 implied HN points • 01 Nov 23

🔮 Crafting Tech Teams • 19 implied HN points • 12 Jul 23

Data: Made Not Found (by danah) • 51 implied HN points • 13 Jun 23

Data People Etc. • 53 implied HN points • 15 Mar 23

The Palindrome • 4 implied HN points • 02 Jan 24

Joe Reis • 2 HN points • 24 Jun 23

ciamweekly • 2 HN points • 26 Feb 24

Chaos Engineering • 5 implied HN points • 24 Feb 23

Data Products • 2 HN points • 23 Jun 23

Minimal Modeling • 1 HN point • 25 Nov 23

Making Things • 1 implied HN point • 06 Nov 23

DevCube • 0 implied HN points • 14 Mar 23

Making Things • 0 implied HN points • 13 Nov 23

Simplicity is SOTA • 0 implied HN points • 17 Jul 23