The hottest Data Tools Substack posts right now

And their main takeaways

SDF

davidj.substack • 59 implied HN points • 12 Feb 25

SDF and SQLMesh are alternatives to dbt for data transformation. They are both built with modern tech and aim to provide better ease of use and performance.
SDF has a built-in local database, allowing developers to test queries without costs from a cloud data warehouse. This can speed up development and reduce costs.
Both tools offer column-level lineage to track changes, but SQLMesh provides a better workflow for managing breaking changes. SQLMesh also has unique features like Virtual Data Environments that enhance developer experience.

Issue #16 - The Data Quality Conundrum (Part 2 – Solving)

The Data Ecosystem • 439 implied HN points • 28 Jul 24

🕹 Technology Data Quality Data Governance Data Management Data Tools Data Strategy Data Analytics

Data quality isn't just a simple fix; it's a complex issue that requires a deep understanding of the entire data landscape. You can't just throw money at it and expect it to get better.
It's crucial to identify and prioritize your most important data assets instead of trying to fix everything at once. Focusing on what truly matters will help you allocate resources effectively.
Implementing tools for data quality is important but should come after you've set clear standards and strategies. Just using technology won’t solve problems if you don’t understand your data and its needs.

5 Key Predictions for the Data Industry in 2025

SeattleDataGuy’s Newsletter • 612 implied HN points • 07 Jan 25

🕹 Technology Data Analysis AI Data Tools Business Intelligence Data Governance

Iceberg will become popular, but not every business will adopt it. Many companies want simpler solutions that fit their needs without needing lots of complicated tools.
SQL isn't going anywhere; it still works well for managing and querying data. People have realized that a bit of order in data is important for getting meaningful insights.
AI use will become more practical, focusing on real-world applications rather than just hype. Companies will find specific tasks to automate using AI, making their workflows more efficient.

AI Agents: Exploring Agentic Applications

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 119 implied HN points • 29 Jul 24

🕹 Technology AI Applications Machine Learning Natural Language Data Tools

Agentic applications are AI systems that can perform tasks and make decisions on their own, using advanced models. They can adapt their actions based on user input and the environment.
OpenAgents is a platform designed to help regular users interact with AI agents easily. It includes different types of agents for data analysis, web browsing, and integrating daily tools.
For these AI agents to work well, they need to be user-friendly, quick, and handle mistakes gracefully. This is important to ensure that everyone can use them, not just tech experts.

From Boom to Bundle: The Great Consolidation of Data Tools

SeattleDataGuy’s Newsletter • 400 implied HN points • 17 Jan 25

🕹 Technology Data Tools Mergers & Acquisitions Analytics Data science Business Intelligence

The data tools market is seeing a lot of consolidation lately, with companies merging or getting acquired. This means there are fewer companies competing, but it can lead to better tools overall.
Acquisitions can be a mixed bag for customers. While some products improve after being bought, others might lose their features or support, making it risky for users.
There's a push for bundled data solutions where customers want fewer, but more comprehensive tools. This could change how data companies operate and how startups survive in the future.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The greatest BI tool ever

davidj.substack • 83 implied HN points • 21 Nov 24

💼 Business Data Tools Business Intelligence Technology Adoption

BI tools often get replaced every 2 to 3 years, but switching them is tough. You have to deal with many dashboards and how people have used them over time.
Many teams stick with tools they know well, like Power BI or Tableau, because of comfort and familiarity. Sometimes it’s easier to choose what they’ve seen work at past jobs.
The best BI tool really isn't a tool at all. It's about how someone uses data to make better choices and understand what's happening, with the tool just being a support for that process.

Why DuckDB is losing to Polars

Data Engineering Central • 373 implied HN points • 29 Jan 24

🕹 Technology Data Engineering Data Tools Community Data Pipelines Technology Adoption

Technology innovations come from solving problems and gaining popularity.
Community engagement and real-world usage are important factors in tool evaluation.
Polars is gaining traction over DuckDB due to its versatility and widespread adoption.

Are Data Contracts For Real?

Data Engineering Central • 294 implied HN points • 05 Feb 24

🕹 Technology Data Engineering Data Contracts APIs Data Quality Data Tools

Data Contracts may not be widely adopted in the data engineering community.
The idea behind Data Contracts is to enforce trustworthiness and consistency in data.
The challenge with Data Contracts seems to be the complexity and adoption of specific technologies.

Taming the Unstructured Beast: Data Tools for Unleashing Generative AI

Gradient Flow • 139 implied HN points • 04 Apr 24

🕹 Technology AI Data Management Data Tools Generative AI

Unstructured data processing is crucial for AI applications like GenAI and LLMs. Extracting and transforming data from various formats like HTML, PDF, and images is necessary to leverage unstructured data.
Data preparation involves tasks like cleaning, standardization, and enrichment. This enhances data quality, making it more suitable for AI applications like Generative AI.
Data utilization in AI integration includes retrieval, visualization, and model serving. Efficient querying, visualizing data trends, and seamless integration of data with AI models are key aspects of successful AI implementation.

Online resources on electricity and energy

GEM Energy Analytics • 339 implied HN points • 13 Oct 23

🌞 Climate & Environment Energy Sustainability Policy Research Data Tools

There are many websites that provide valuable data on electricity generation and energy prices, especially in Europe. These resources can help understand the energy market better.
Tools like Ember Climate and Electricity Maps offer useful visualizations to track emissions and power generation in various regions.
The International Energy Agency and the U.S. Energy Information Administration are great sources for reliable energy data and insights globally.

What does Alteryx do?

Technically • 12 implied HN points • 07 Jan 25

🕹 Technology Data Tools Analytics Software Business Intelligence Data Management

Alteryx is a tool that helps teams make sense of messy data without needing to code. It allows people to clean and analyze their data easily.
Many companies have limited access to specialized data teams, which makes tools like Alteryx important for non-technical users.
Alteryx started with a simple workflow builder for data cleaning but has grown to include many other analytics tools over time.

Why Start-up CEOs no longer require a full-blown data team

The Orchestra Data Leadership Newsletter • 79 implied HN points • 18 Mar 24

💼 Business Start-ups Data Analytics Data Tools

CEOs are moving away from hiring full data teams and are opting for small consultancies to set up their data stack, reducing risk and cost.
One-person data teams in startups face overwhelming responsibilities, leading to chaos and potentially costly decisions.
New technologies like Orchestra help single-person data teams maintain visibility and orchestration without expensive tools, accelerating the data value businesses receive.

Lots of Fancy Tools, and No Idea How to Use Them

Joe Reis • 196 implied HN points • 05 Aug 23

🕹 Technology Data Tools Best Practices

There are a lot of advanced data tools available, but many struggle with how to use them effectively.
The main challenge in the data industry today is a lack of understanding of basic data practices and best tool practices.
Data teams need to focus on standardizing their knowledge and competencies to increase the value they provide to the business.

Choosing a Data Quality Tool

Sarah's Newsletter • 359 implied HN points • 22 Feb 22

🕹 Technology Data Quality Data Tools

Data quality tools are essential for maintaining trust in data and preventing stakeholders from resorting to workaround solutions.
Choosing the right data quality tool involves understanding the specific needs of your organization and considering factors like budget, technical resources, and overall data quality goals.
There are different types of data quality tools available, including auto-profiling data tools, pipeline testing tools, infrastructure monitoring tools, and integrated solutions, each with unique characteristics and considerations for selection.

Modernity != Value

Sarah's Newsletter • 299 implied HN points • 19 Apr 22

🕹 Technology Data Analytics Data Tools Data Platforms Problem Solving

Having modern tools doesn't guarantee providing value - it's more about how analytics teams use the tools to drive organizational change.
The focus should be on delivering value to the organization rather than just building data platforms or using the most modern tools.
Start simple with the minimum viable data stack and only add complexity when necessary - focus on solving real problems and evaluating tools based on problem-solving, maintenance, and scalability.

We Need Efficient and Transparent Language Models

Gradient Flow • 179 implied HN points • 01 Dec 22

🕹 Technology NLP Machine Learning Data Tools AI Reinforcement Learning

Efficient and Transparent Language Models are needed in the field of Natural Language Processing for better understanding and improved performance.
Selecting the right table format is crucial when migrating to a modern data warehouse or data lakehouse.
DeepMind's work on controlling commercial HVAC facilities using reinforcement learning resulted in significant energy savings.

Software Engineering is Coming for More Than Data Tools/Practices: It's Coming for Identity

Sung’s Substack • 79 implied HN points • 10 Jul 23

🕹 Technology Software Engineering Data Tools Identity

Software engineering is evolving to impact more than just data tools and practices - it's influencing identity within the industry.
The data industry is experiencing a significant shift towards merging software and data engineering, requiring a new level of ownership and empathy between the two.
The goal is to create a world where data pipelines are more proactive than reactive, data as a product is ubiquitous and pain within the industry is minimized, leading to personal and professional growth.

Data and AI job markets are slowing down

Gradient Flow • 179 implied HN points • 20 Oct 22

🕹 Technology AI Job Market Podcasts Data Tools

Data and AI job markets are showing signs of slowdown with declines in job postings, except for specific areas like data governance, DataOps, and MLflow.
The technology job market, despite overall softening, still seeks specific technical skills with recruiters actively reaching out.
The AutoML market is poised for significant growth, estimated to reach $14.5 billion in revenue by 2030, presenting immense potential for accelerating product development.

2 Data Tools to Help You Come Up With Awesome Blog Ideas in 2023

Kristina God's Online Writing Club • 99 implied HN points • 07 Dec 22

💼 Business Marketing Content creation Data Tools SEO Online Writing

Google Trends can help you find what topics people are searching for. Just type in a keyword and check the related queries for new blog ideas.
AnswerThePublic shows popular questions about a keyword, which you can answer in your blog. This can attract more visitors from search engines.
In 2023, Medium may start paying writers for traffic coming from outside sources like Google, making it more rewarding to write content that draws readers.

The Data Market is not consolidating | It's growing

The Orchestra Data Leadership Newsletter • 19 implied HN points • 23 Jan 24

🕹 Technology Data Tools

The data market is not consolidating; it's expanding with many players offering differentiated products and little consolidation happening.
There is a growing complexity in data operations, leading to the necessity of more specialized tools rather than all-in-one platforms.
The future of the data market may see a trend towards out-the-box connectivity to address the increasing complexity and interoperability challenges faced by data teams.

The hottest SQL tools you have no use for

The Orchestra Data Leadership Newsletter • 19 implied HN points • 16 Nov 23

🕹 Technology Data Engineering SQL Data Tools

SQL is a powerful data manipulation tool that has different dialects and evolved over time to fit various database software needs.
New SQL tools like dbt, SQLMesh, and Semantic Data Fabric aim to improve data testing, quality, and governance in data engineering processes.
The value in data engineering lies more in processes, culture, and diligence, rather than solely relying on fancy tools to prevent mistakes.

Zero ELT could be the death of the Modern Data Stack

The Orchestra Data Leadership Newsletter • 19 implied HN points • 13 Nov 23

🕹 Technology Data Analysis Data Tools Data Management Data Integration

Zero ELT aims to streamline data processing by eliminating traditional extraction, loading, and transformation tools.
Zero ELT tools are evolving to focus more on use-case specialization rather than functional grounds, leading to a trade-off between stack complexity and having the best tool for the job.
Zero ELT tools, while promising in simplifying processes, may create data silos, lack interoperability with other tools, and bring about stack complexity issues.

Gradient Flow #44: 2021 NLP Industry Survey Results; No-Code Landscape

Gradient Flow • 119 implied HN points • 23 Sep 21

🕹 Technology NLP No Code Machine Learning Data Tools Infrastructure

The 2021 NLP Industry Survey received responses from 655 people worldwide, providing insights into how companies are using language applications today.
Tools like Hugging Face NLP Datasets and TextDistance library are making data processing and comparison easier in Python.
There is a trend towards low-code and no-code development tools that are boosting developer productivity and extending the pool of software application creators.

Open Source has my Whole Heart

Sung’s Substack • 3 HN points • 08 May 24

🕹 Technology Open Source Data Engineering Data Tools

Open source is a beautiful pursuit that allows people to solve problems they love while connecting with others.
Career paths can evolve, leading to new opportunities and self-discovery in pursuing work that aligns with personal values and passions.
Improvements in data tools and workflows, like understanding SQL deeply and prioritizing statefulness, can revolutionize data work and make processes more intuitive and efficient.

Gradient Flow #39: Becoming TikTok, Next-gen Workflow Orchestration and Forecasting

Gradient Flow • 19 implied HN points • 15 Jul 21

🕹 Technology Data Tools Infrastructure Automation Forecasting

The newsletter discusses next-gen dataflow orchestration and automation systems like Prefect, a startup that helps manage dataflows.
It introduces cool new open source tools like Greykite, a flexible and fast library for time-series forecasting.
BytePlus, a new division of ByteDance, is offering the technology behind TikTok to websites and apps, presenting interesting challenges in the global market.

Gradient Flow #31: AI in Healthcare, Data Quality, Understanding Neural Networks

Gradient Flow • 19 implied HN points • 25 Mar 21

🕹 Technology AI Data Quality Neural Networks Machine Learning Data Tools

Podcast on Mathematics of Data Integration and Data Quality with Ryan Wisnesky from Conexus
Survey on AI and Machine Learning in Healthcare, Biotech, and Pharmaceutical industries
Various tools and infrastructure updates in Data & Machine Learning, like Apache Airflow and Evidently

Data Science Weekly - Issue 368

Data Science Weekly Newsletter • 19 implied HN points • 10 Dec 20

🕹 Technology Data science Machine Learning Artificial Intelligence Deep Learning Data Tools

Machine learning needs systematic approaches to create strong systems for real-world use. This means looking beyond just algorithms to see the bigger picture.
Deep neural networks are powerful, but understanding how they work can be tricky. Tools like network dissection can help us figure out what these networks are really doing.
Feature stores are becoming important for machine learning. They allow teams to share and manage data better for creating and deploying models quickly.

✅ Monday briefing #16: Industry report, conversational surveys, buzzword busting, Google analytics update, talk to me, Facebook group therapy, Zoom marketplace, personal data control, and more

Wadds Inc. newsletter • 19 implied HN points • 19 Oct 20

🕹 Technology Data Tools Social media Market Trends Communication

The COVID-19 Communications Industry Report highlights how professionals adapted and innovated during the crisis, showing resilience and new opportunities.
There are new tools designed to help with research, time tracking, and media relations, aimed at making marketing and PR work more efficient.
A new privacy standard allows users to control their personal data better by instructing websites not to sell or share their information.

The Rise of the Data Product Manager

The Orchestra Data Leadership Newsletter • 0 implied HN points • 17 Nov 23

🕹 Technology Data Management Product Management Data Tools Data Infrastructure Marketplaces

The role of Data Product Manager is gaining importance in the data industry, with a focus on delivering value and advocating for data to drive business outcomes.
Tools like Fivetran, dbt, Snowflake, and platforms like Orchestra are simplifying data team setups and enabling Product Managers with less technical skills to handle data initiatives effectively.
Federated teams, marketplace functionalities by Databricks and Snowflake, and the evolving concept of data quality and productization are shaping the field of data management towards a more product-led approach.