The hottest Data Modeling Substack posts right now

And their main takeaways
Category
Top Technology Topics
Minimal Modeling 811 implied HN points 02 Feb 25
  1. A key goal in data modeling is to make sure invalid data states cannot be created. This means designing systems where incorrect data combinations are impossible.
  2. The challenge presented involves creating a way to track daily coffee consumption while preventing contradictory data entries, like recording that a user both had coffee and was coffee-free on the same day.
  3. Using common database features, the task is to develop a solution that complies with standard relational model rules, avoiding the use of tricks like JSON data types or triggers.
davidj.substack 179 implied HN points 25 Nov 24
  1. Medallion architecture is not just about data modeling but represents a high-level structure for organizing data processes. It helps in visualizing data flow in a project.
  2. The architecture has three main layers: Bronze deals with cleaning and preparing data, Silver creates a structured data model, and Gold is about making data easy to access and use.
  3. The terms Bronze, Silver, and Gold may sound appealing to non-technical users but could be more accurately described. Renaming these layers could better reflect their actual roles in data handling.
Joe Reis 530 implied HN points 20 Jan 24
  1. Data modeling has various definitions by different experts and serves to improve communication, provide utility, and solve problems.
  2. A data model is a structured representation that organizes data for both humans and machines to inform decision-making and facilitate actions.
  3. Data modeling is evolving to consider the needs of machines, different use cases, and a wider range of modeling approaches for various situations.
davidj.substack 59 implied HN points 06 Dec 24
  1. There are different types of models in sqlmesh, such as full, view, and embedded models, each having unique functions and uses. It's important to choose the right model type based on how fresh or how often you need the data.
  2. SCD Type 2 models are useful for managing records that change over time, as they track the history of changes. This can make analyzing data trends much easier and faster.
  3. External models in sqlmesh allow you to reference database objects not managed by your project. This can simplify data modeling and documentation, as they automatically gather useful metadata.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Mindful Modeler 479 implied HN points 09 Jan 24
  1. Dealing with non-i.i.d data in machine learning can prevent data leakage, overfitting, and overly optimistic performance evaluation.
  2. For modeling data with dependencies, classical statistical approaches like mixed effect models can be used to correctly estimate coefficients.
  3. In non-i.i.d. data situations, the data splitting setup must align with the real-world use case of the model to avoid issues like row-wise leakage and over-optimistic model performance.
Joe Reis 648 implied HN points 22 Jul 23
  1. There are abundant tools and computing power available, but focusing on delivering business value with data is still crucial.
  2. Data modeling, like Kimball's dimensional model, remains relevant for effective analytics despite advancements in technology.
  3. Ignoring data modeling in favor of performance considerations can lead to a loss of understanding, business value, and overall impact.
imperfect offerings 239 implied HN points 02 Feb 24
  1. The research economy is increasingly focused on speed over quality, especially with the rise of generative AI, which can have negative impacts on reproducibility and diverse fields of knowledge.
  2. Data models in research need to be carefully scrutinized for accuracy and not blindly relied upon, even in specialized areas like protein folding, climate science, or medical diagnostics.
  3. Speed and heuristics shouldn't overshadow the importance of deliberation, qualitative research, and embracing complexity in arriving at meaningful solutions to multidimensional problems.
SeattleDataGuy’s Newsletter 612 implied HN points 21 Nov 23
  1. Normalization structures data to reduce duplication and ensure integrity.
  2. Goals of normalization include eliminating redundancy, minimizing data mutation issues, and protecting data integrity.
  3. Denormalization introduces redundancy strategically to improve read performance, useful for reporting, analytics, and read-heavy applications.
timo's substack 294 implied HN points 28 Feb 23
  1. Marketing analytics, BI, and product analytics have different requirements for source data and data handling.
  2. Product analytics involves more exploration and pattern-finding compared to marketing analytics and BI.
  3. Adopting product analytics requires a different approach, mindset, and tool compared to traditional analytics setups.
Data Engineering Central 255 implied HN points 10 Jul 23
  1. Data Modeling involves distinct approaches for relational databases and Lake Houses.
  2. Key concepts like logical normalization, business use case analysis, and physical data localization are crucial for effective data modeling.
  3. Understanding the 'grain' of the data, or the lowest level of detail in a record, is essential for a successful data model.
The Orchestra Data Leadership Newsletter 59 implied HN points 29 Apr 24
  1. Ensure rock-solid infrastructure for your Snowflake implementation to prevent pipeline failures and maintain data quality.
  2. Set clear expectations and prioritize projects to manage scope and quality, fostering trust and collaboration.
  3. Start thinking of data as a product during the Snowflake implementation to minimize costs, stabilize usage, and accelerate trust in the data team.
timo's substack 117 implied HN points 06 Feb 24
  1. Data modeling for event data involves handling various source data and supporting diverse analysis use cases.
  2. Event data modeling can be organized into layers, from raw source data to consumption-ready data for analytics tools.
  3. Qualifying events to activities in event data modeling helps improve data usability and user experience in analytics tools.
Minimal Modeling 405 implied HN points 12 Oct 23
  1. The author has been sharing daily Twitter threads on data-related topics, which will continue as long as there is something to write about.
  2. The author is considering crossposting these threads to Substack Notes and is seeking feedback on reader interest.
  3. The author has found changing the cadence and format of their posts to be refreshing and an interesting experiment.
Joe Reis 176 implied HN points 17 Jun 23
  1. Data professionals interpret the concept of 'model' in various ways, leading to confusion and inconsistency in the field.
  2. Establishing a shared understanding through high-level data modeling can promote consistent and reliable models in organizations.
  3. The use of AI tools in programming has become widespread, indicating a shift in the nature of programming but emphasizing the importance of understanding and verifying AI-generated code.
Minimal Modeling 202 implied HN points 07 Sep 23
  1. Data modeling involves layers like actual business domain, logical model, physical model, and database storage optimization.
  2. Focus primarily on the logical model and how it maps to the physical model for practical advice on table structures.
  3. Key areas within the scope of data modeling include basic logical model, handling either/or/or data, modeling polymorphic data, template repetitions, basic physical model, and secondary data.
davidj.substack 95 implied HN points 01 Nov 23
  1. Having a standard interface for semantic layers is crucial to prevent failure and ensure compatibility among different layers.
  2. SQL APIs offered by semantic layers may not be truly SQL, leading to potential confusion and challenges in querying data.
  3. Supporting REST HTTP interfaces for semantic layers enables a broader range of use cases, including data applications for internal and external purposes.
VuTrinh. 39 implied HN points 31 Oct 23
  1. Data engineers are becoming more important in the tech world as they handle vast amounts of data. Their role is focused on building systems that allow for efficient data handling and analysis.
  2. Levels of abstraction in data engineering can be confusing, leading to challenges in understanding systems. It’s important to find a balance between using abstractions and being able to see the underlying processes.
  3. Good data modeling practices can help organizations make better use of their time-series data. Understanding how to structure data effectively is key to unlocking its value.
Minimal Modeling 101 implied HN points 24 Jul 23
  1. In modeling, consider defining links based on specific sentence structures, like anchor, verb, anchor.
  2. Carefully distinguish between false links and actual links to avoid modeling mistakes.
  3. Identifying and managing different types of links can prevent confusion and improve database accuracy.
🔮 Crafting Tech Teams 19 implied HN points 12 Jul 23
  1. The post discusses the evolution of data with a focus on concepts like MapReduce, Data Warehouses, and Lakes.
  2. It mentions being inspired by the book 'Designing Data-Intensive Applications' by Martin Kleppmann and drawing parallels with modern data tools.
  3. Readers are invited to subscribe to 'Crafting Tech Teams' for more content and a 7-day free trial.
Data: Made Not Found (by danah) 51 implied HN points 13 Jun 23
  1. Focusing on low-stakes data modeling failures is important to understand how algorithms are shaping minor aspects of our lives.
  2. Supply chains and service-based businesses are facing challenges from flawed data modeling, affecting customers, workers, and businesses.
  3. Everyday interactions like car rentals and food delivery are revealing flaws in data modeling, leading to frustration and distrust in brands.
ciamweekly 2 HN points 26 Feb 24
  1. Data modeling involves the choice between normalizing data and using denormalized data, each with its own strengths and tradeoffs.
  2. Normalized data leads to less data duplication and easier data updates, but may result in challenges with historical data and performance.
  3. CIAM systems, along with IAM and directory systems, normalize user data to centralize customer information, providing benefits like easy querying and centralized authentication, but also introducing challenges like session handling and updating data across systems.
Joe Reis 2 HN points 24 Jun 23
  1. Data modeling needs to adapt to modern business workflows and technologies.
  2. There is a need to address the underlying issues in databases and data warehouses before implementing AI solutions.
  3. Practices like conceptual and logical data modeling should be revitalized and made simpler and more iterative.
Chaos Engineering 5 implied HN points 24 Feb 23
  1. ChatGPT can learn some superficial aspects of finance but needs explicit training to become a financial expert.
  2. For ChatGPT to learn fintech, a hybrid architecture combining its pretrained model with a specific ML model optimized for financial tasks is necessary.
  3. Improving ChatGPT's understanding of finance requires training it on structured financial data and updating its architecture to process dense, numeric data.
Data Products 2 HN points 23 Jun 23
  1. The difference between OLTP and OLAP systems can cause miscommunication among data producers and consumers.
  2. OLTP systems focus on serving end users quickly with specific product features, while OLAP systems handle complex analytics by scanning large amounts of data.
  3. Empathy and communication between OLTP and OLAP teams are crucial to building scalable data products.
Making Things 1 implied HN point 06 Nov 23
  1. Many semantic layers are built with YAML for its readability and quick setup, but it can lead to a poor developer experience.
  2. YAML lacks immediate feedback for complex expressions, forcing users into a guessing game when writing configurations.
  3. Implementing a real programming language instead of just a configuration DSL can provide instant feedback and support complex data modeling.
Simplicity is SOTA 0 implied HN points 17 Jul 23
  1. A model of everything predicts final and intermediate goals of a company, is causal, and covers significant inputs.
  2. Foundational choices in building a model of everything include deciding the scope, complexity of relationships, and optimization strategy.
  3. Financial forecasting often involves models of everything, built in spreadsheets, but may not work well for machine learning models.