The hottest Data Transformation Substack posts right now

Running dbt-core on Github Actions: a comprehensive guide

The Orchestra Data Leadership Newsletter • 39 implied HN points • 18 Apr 24

Advantages of running dbt-core on GitHub Actions include easy workflow definition in Git, immediate access to latest code, and no need to provision instances for GitHub hosted runners.
Disadvantages of running dbt-core on GitHub Actions include being limited by GitHub's workers, 'fire and forget' implementation, and overhead when connecting to external services.
GitHub Actions workflows can be triggered from external sources like orchestrators using the repository dispatch event or the workflow_dispatch event, providing flexibility in integrating GitHub's CI/CD capabilities into larger automation strategies.

We don’t need data contracts

davidj.substack • 71 implied HN points • 16 Feb 24

🕹 Technology Data Management Data Contracts Data Transformation

Data teams face challenges when separated from product engineering, leading to loss of metadata and concerns about data quality. Data contracts can help address these issues by defining the nature, completeness, and format of shared data.
Integrating data professionals within product teams can enhance understanding and usage of data, reducing the need for separate contracts. This approach allows for direct-to-consumer, organic data processes.
Centralized data platform teams can establish common standards and infrastructure, enabling embedded data personnel in product teams to work efficiently. This collaborative model streamlines data transformation and enhances data accessibility.

A dbt cloud alternative with no-code ELT: end-to-end data pipelines in Coalesce on Snowflake

The Orchestra Data Leadership Newsletter • 39 implied HN points • 09 Jan 24

🕹 Technology Data Pipelines Data Transformation ETL

The article discusses building a data release pipeline to analyze Hubspot data using Coalesce, a no-code ELT tool on Snowflake.
One key issue encountered was the challenges with Hubspot's data model when trying to consolidate form fill data and messages into a meaningful view.
Setting up Coalesce involves defining storage mappings, granting access to Coalesce users, and carefully handling environments to prevent data overwriting when working between development and production.

Data Oracles

Datent • 58 implied HN points • 24 May 23

🕹 Technology Data Analysis Data Trends Data Transformation Data Governance Data Ethics

The best predictions come from deep analysis of today's data challenges and trends.
Data oracles provide valuable insights for the future by understanding present data trends.
Data writers like Davenport, Moses, Madsen, and Thomas offer grounded observations and advice on data topics.

AI Roles: Modifier

Embracing Enigmas • 19 implied HN points • 29 Jan 24

🕹 Technology AI Machine Learning Data Transformation Experimentation

Modifiers on AI teams manipulate components to get desired outputs.
Modifiers experiment at the edge to show what's possible.
Good modifiers constantly question, experiment, and push boundaries.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Data Hierarchy of Needs

The Orchestra Data Leadership Newsletter • 19 implied HN points • 26 Nov 23

🕹 Technology Data Engineering Data science Data Transformation

Data can be structured in a hierarchy similar to Maslow's Hierarchy of Needs, where each level is necessary for the enjoyment of the level above it. This concept applies to data engineering pipelines.
Data pipelines are crucial for deriving business value, even if they are complex and not directly visible. Architectural considerations and infrastructure choices play a significant role in making data a priority in a business.
When considering data infrastructure, such as data ingestion tools, cloud warehouses, BI tools, and others, it's important to plan the entire stack and not just jump to specific infrastructure. Consider aspects like version control, security, integration, and orchestration.

Prompt Chaining

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 06 Apr 23

🕹 Technology AI Programming Chatbots User Interface Data Transformation

Visual Programming tools are being used to connect prompts in applications, making it easier to create conversational interfaces.
Chaining prompts involves transforming and organizing data from responses to ensure better output and decision-making in AI applications.
Good design of these tools includes making it easy to build, edit, and debug chains while also allowing users to interact flexibly with the AI.

Introducing the Malloy Command Line Interface

Making Things • 3 implied HN points • 25 Aug 23

🕹 Technology Data Transformation Documentation

Malloy CLI simplifies data transformation by offering reusable calculations.
Malloy enables defining named measures for easy aggregation across dimensions.
Malloy CLI makes code concise and maintainable compared to traditional SQL views.

What is an embedding, anyways?

Simplicity is SOTA • 2 HN points • 27 Mar 23

🕹 Technology Machine Learning Neural Networks Embeddings Data Transformation

The concept of 'embedding' in machine learning has evolved and become widely used, replacing terms like vectors and representations.
Embeddings can be applied to various types of data, come from different layers in a neural network, and are not always about reducing dimensions.
Defining 'embedding' has become challenging due to its widespread use, but the essence is about learned transformations that make data more useful.

Monthly Update - 3 keys systems, role creep and a book club

Datent • 0 implied HN points • 01 Feb 24

💼 Business Data Transformation Knowledge Management CRM

Developing key systems is crucial for a successful data transformation.
Role creep in CDO/CDAO responsibilities is an ongoing concern.
Consider joining a data strategy book club for learning and community engagement.

Here’s why you need to change your work with data, at your whole company; Thoughtful Friday #28

Three Data Point Thursday • 0 implied HN points • 31 Mar 23

🕹 Technology Data Technology Trends Data Management Data Analysis Data Transformation

Data space is growing exponentially with new trends and transformations.
In a complex data environment, continuous probing and response is crucial.
Consider large-scale transformations to change how your company works with data.

In Case You Missed It: June 2023 Recap

Three Data Point Thursday • 0 implied HN points • 29 Jun 23

🕹 Technology Data Engineering Data Transformation Deep Learning

Consider ditching dbt for SQLMesh, a tool designed for data engineers.
Explore alternatives to dbt for modern data transformation tools to suit your workflow better.
Invest in understanding geometric deep learning for its business potential, even though it might not be practical yet.