The hottest SQL Substack posts right now

And their main takeaways
Category
Top Technology Topics
Minimal Modeling 304 implied HN points 15 Mar 26
  1. Treat queries as functions and start by defining anchors: maintain a compact one‑column list of unique IDs for each entity and document retention/archive rules so input data quality is clear.
  2. Represent attributes and links as clean two‑column datasets (anchor ID + value or anchor ID + anchor ID), filter out NULLs and sentinel values, canonicalize values, use only atomic types, and ensure uniqueness.
  3. Materialize those compact datasets and keep them updated with a pipeline so your data is correct by construction; from these trusted pieces you can build flat tables while avoiding common issues like duplicates, unclear identity, and messy JSON.
Minimal Modeling 304 implied HN points 29 Jan 26
  1. Lock a subtype/status column to a single value with a CHECK so subtype tables can only hold rows for that exact status, and reference the main table with a composite foreign key (id, status) to prevent contradictory data.
  2. Give the main table a unique (id, status) pair and make subtype tables include a defaulted, immutable status plus their own keys so you can model both single- and multi-row status-specific information without NULLs.
  3. This is a pure relational, NULL-free way to encode subtypes/status-dependent data using only standard constraints (CHECK, PK, FK), moving integrity into the schema and making the design extensible even if it isn’t commonly taught.
Minimal Modeling 202 implied HN points 12 Jan 26
  1. Model joins by attaching a nested dataset to each outer row and then flattening by duplicating the outer row for each inner row; if the inner set is empty you skip the outer row for INNER JOIN or replace it with a single NULL row for LEFT JOIN.
  2. The inner part of a query becomes very simple: INNER JOIN is just a filtered SELECT, GROUP BY is an aggregated filtered SELECT, and LEFT JOIN is a filtered SELECT plus a conditional UNION ALL NULL row, so no special-casing is needed.
  3. Splitting queries into an outer table and a per-row inner dataset gives a clear, teachable mental model and a single canonical flattening rule you can reuse to reason about more complex SQL patterns like correlated subqueries.
Data Analysis Journal 353 implied HN points 22 Mar 23
  1. Analytics engineers bridge the gap between data engineers and data analysts by focusing on producing high-quality data.
  2. Analytics engineers use tools like dbt to streamline data modeling, testing, and documentation.
  3. Data quality is crucial in decision-making, making analytics engineering more important than ever.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Data Engineering Central 216 implied HN points 13 Feb 23
  1. Data Engineers often struggle with implementing unit tests due to factors like focus on moving fast and historical lack of emphasis on testing.
  2. Unit testable code in data engineering involves keeping functions small, minimizing side effects, and ensuring reusability.
  3. Implementing unit tests can elevate a data team's performance and lead to better software quality and bug control.
The Orchestra Data Leadership Newsletter 19 implied HN points 16 Nov 23
  1. SQL is a powerful data manipulation tool that has different dialects and evolved over time to fit various database software needs.
  2. New SQL tools like dbt, SQLMesh, and Semantic Data Fabric aim to improve data testing, quality, and governance in data engineering processes.
  3. The value in data engineering lies more in processes, culture, and diligence, rather than solely relying on fancy tools to prevent mistakes.
Minimal Modeling 101 implied HN points 11 Jul 23
  1. In minimal modeling, links are defined with two anchors, not three.
  2. Using two-way links can model examples effectively without the need for 3-way links.
  3. Links in minimal modeling can introduce confusion when sentences in natural language aren't validated as actual links.
I Am Not a Robot 69 HN points 22 Jun 23
  1. Language models can generate SQL queries, but can also create malicious queries if not careful.
  2. Running infinite loops or allowing data exfiltration are risks with generated SQL queries.
  3. Consider restricting permissions, making the database read-only, and avoiding prompt injection to reduce SQL injection risks with language models.
Leading Developers 3 HN points 13 Feb 24
  1. SQL skills are crucial for managers because they can help answer business questions, understand technical designs, and provide a huge return on effort invested.
  2. Don't stop with just learning joins in SQL. Advancing to using CTEs, window functions, and partitions can greatly enhance your ability to write complex queries.
  3. Window functions in SQL, such as ranking functions, aggregation functions, and positional functions, can help in advanced query writing by allowing calculations across sets of rows or returning a single value from a specific row within partitions.
ingest this! 1 HN point 19 Feb 24
  1. Build data apps using markdown and SQL with Evidence framework, offering a way to create polished data products.
  2. Explore the future synergy of knowledge graphs and large language models (LLMs) for enhanced technologies.
  3. Engage with the latest in data engineering by checking out a full exploration of the open-source data engineering landscape for 2024.
Conserving CPU's cycles ... 0 implied HN points 26 Jun 24
  1. Incremental sort was added in PostgreSQL 2020 to enhance sorting strategies and improve efficiency in handling large datasets and analytical queries.
  2. Estimation instability in PostgreSQL's sort operations can lead to unexpected query plans and performance differences, emphasizing the importance of careful estimation.
  3. The vulnerability in PostgreSQL's optimizer code showcases how the choice of expression evaluation can impact query performance, highlighting a need for optimization improvements.
The Orchestra Data Leadership Newsletter 0 implied HN points 31 Oct 23
  1. Understanding the importance of incremental models for managing big data is crucial to efficiently running complex queries and maintaining data quality.
  2. Design patterns in data modeling, such as Star Schema and Data Vault, play a significant role in how dbt models are structured and managed.
  3. Using Jinja templating and implementing continuous data integration processes are key elements in handling big models effectively and ensuring data reliability.
DataSketch’s Substack 0 implied HN points 07 Oct 24
  1. Window functions let you do calculations across rows related to your current row without losing any details. This helps you get both summarized and detailed data at the same time.
  2. Using window functions can make complex data tasks easier, like ranking items or finding running totals. They are very helpful in fields like healthcare to analyze patient data and improve efficiency.
  3. It's important to test how window functions perform on a smaller dataset before using them widely. Combining multiple window functions and partitioning your data smartly can also boost performance.
Expand Mapping with Mike Morrow 0 implied HN points 14 Jul 25
  1. You can choose how SQL query results are stored in Hex, either in memory or in the database. This affects how quickly you can run follow-up queries.
  2. There are two types of SQL commands in Hex: one that queries directly from the database and another that queries from a local in-memory dataframe. This choice can impact how your data is used.
  3. Hex allows you to chain SQL queries, which makes handling complex tasks easier. However, you need to be aware of where each query pulls data from to avoid surprises.
Making Things 0 implied HN points 13 Nov 23
  1. A semantic data model includes pre-built calculations and relationships.
  2. There are two main types of queries: lookup and aggregating.
  3. In a semantic data model, querying involves selecting dimensions and measures, simplifying the process.
Making Things 0 implied HN points 23 Nov 23
  1. If you can make something 10x more efficient, you have a winner.
  2. Malloy aims to replace SQL for asking questions of data.
  3. Malloy's efficiency shines when multiple queries are involved, offering reusability and speed.
Reflective Software Engineering 0 implied HN points 12 Jan 24
  1. Having unit tests for SQL queries can help catch bugs introduced during code refactorings or changes.
  2. When writing unit tests for SQL queries, focus on testing the specific parts responsible for building the query rather than the entire method.
  3. Refactoring code for testability can involve moving pure functions outside of the class for easier testing and simplifying methods to focus on specific tasks.