The hottest Data Management Substack posts right now

And their main takeaways

The World's Largest Search Doesn't Want You to Search

The Honest Broker • 45746 implied HN points • 19 Feb 25

🕹 Technology Search Engines AI Digital Platforms Privacy Data Management

Search engines, especially Google, are moving away from their main job of helping people find information. Instead, they want to keep users on their platforms with AI results that don’t always give good answers.
Google prioritizes its advertising and profitability over providing reliable search results. People often end up with low-quality information or ads instead of what they are really looking for.
Many users are losing trust in Google and other big tech companies because they feel the platforms are not serving their needs. If this trend continues, it could lead to serious consequences for these companies.

RIP Skype

The PhilaVerse • 123 implied HN points • 28 Feb 25

🕹 Technology Communication Software Platforms User Experience Data Management

Microsoft is shutting down Skype on May 5, 2025, after more than two decades of service. They are focusing on Teams now for communication.
Users have 10 weeks to move their data from Skype to Teams or export their information. After that, user data will be kept until the end of 2025 before it is deleted.
Skype had a big drop in users, going from 300 million at its peak to only 36 million daily users by 2023, which is why Microsoft made this decision.

The National Parking Platform is a big, exciting deal and you should know about it

Odds and Ends of History • 1608 implied HN points • 22 May 25

🇺🇸 U.S. Politics Government Policy Transport Infrastructure Data Management

The National Parking Platform (NPP) is a new data system that makes paying for parking easier by allowing any payment app to work with any car park. This means you won't have to download many apps just to park your car.
This platform collects data from all car parks, which helps local authorities manage parking better and reduce traffic by making sure spaces are used efficiently.
The NPP could lead to new ways of thinking about parking, like offering discounts for electric cars or using real-time data to help drivers find available spots before they arrive.

Job listing: Assistant Sports Analyst

Silver Bulletin • 30 implied HN points • 26 Feb 25

🎾 Sports Sports Analysis Job Listings Data Visualization Statistical Modeling Data Management

An Assistant Sports Analyst position is open, mostly focusing on improving sports models for NFL, NBA, and college basketball. It's part-time and could turn into full-time.
Candidates need skills in Stata, Python, and data analysis, along with a strong interest in sports. Pay ranges from $40-50 per hour, depending on work done.
To apply, email with your materials and be prepared for interviews in early April. The deadline to apply is March 25, 2024.

Iceberg + Single Node Engines

Ju Data Engineering Newsletter • 515 implied HN points • 17 Oct 24

🕹 Technology Data Engineering Cloud Computing Big Data Software Development Data Management

The use of Iceberg allows for separate storage and compute, making it easier to connect single-node engines to the data pipeline without needing extra steps.
There are different approaches to integrating single-node engines, including running all processes in one worker or handling each transformation with separate workers.
Partitioning data can improve efficiency by allowing independent processing of smaller chunks, which reduces the limitations of memory and speeds up data handling.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Clouded Judgement 6.13.25 - The Battle for Data Ownership

Clouded Judgement • 7 implied HN points • 13 Jun 25

🕹 Technology Data Management Software Development Artificial Intelligence Cloud Computing SaaS

You might think you own your data, but companies can make it hard to use. For example, Slack has new rules that limit how you can access your own conversation data.
If other apps like Salesforce or Workday follow Slack's lead, it could become really tough for companies to use their data in AI projects. This means you might not have as much control as you thought.
The fight for data ownership is a big deal right now. As software shifts towards AI, who controls the data will be a key factor in how companies operate.

Uber’s Big Data Revolution: From MySQL to Hadoop and Beyond

VuTrinh. • 279 implied HN points • 14 Sep 24

🕹 Technology Data Engineering Big Data Cloud Computing Data Management Data Analytics

Uber evolved from simple data management with MySQL to a more complex system using Hadoop to handle huge amounts of data efficiently.
They faced challenges with data reliability and latency, which slowed down their ability to make quick decisions.
Uber introduced a system called Hudi that allowed for faster updates and better data management, helping them keep their data fresh and accurate.

How to Keep Your Data Team From Becoming a Money Pit

SeattleDataGuy’s Newsletter • 282 implied HN points • 23 May 25

💼 Business Data Management Consulting Project management Business strategy

It's important to focus on outcomes, not just outputs. Creating a lot of dashboards means nothing if they don't help people make better decisions.
Making good data work requires engaging with stakeholders. Understanding what users actually need can lead to more effective solutions.
Success in data teams means having clear ownership and goals. Projects can fail if no one knows who is responsible for them or what they should achieve.

Issue #14 - The Forgotten Guiding Role of Data Modelling

The Data Ecosystem • 659 implied HN points • 14 Jul 24

🕹 Technology Data science Data Management Information Systems Business Intelligence Database Design

Data modeling is like a blueprint for organizing information. It helps people and machines understand data, making it easier for businesses to make decisions.
There are different types of data models, including conceptual, logical, and physical models. Each type serves a specific purpose and helps bridge business needs with data organization.
Not having a structured data model can lead to confusion and problems. It's important for organizations to invest in good data modeling to improve data quality and business outcomes.

Issue #16 - The Data Quality Conundrum (Part 2 – Solving)

The Data Ecosystem • 439 implied HN points • 28 Jul 24

🕹 Technology Data Quality Data Governance Data Management Data Tools Data Strategy Data Analytics

Data quality isn't just a simple fix; it's a complex issue that requires a deep understanding of the entire data landscape. You can't just throw money at it and expect it to get better.
It's crucial to identify and prioritize your most important data assets instead of trying to fix everything at once. Focusing on what truly matters will help you allocate resources effectively.
Implementing tools for data quality is important but should come after you've set clear standards and strategies. Just using technology won’t solve problems if you don’t understand your data and its needs.

16 Cybersecurity Startups Selected for Google Growth Academy

The Security Industry • 11 implied HN points • 16 Feb 25

🕹 Technology Cybersecurity Artificial Intelligence Startups Data Management Cloud Computing

IT-Harvest is part of Google's Growth Academy for 2025, focusing on supporting cybersecurity startups. This helps them connect with experts and gain valuable resources.
The platform has evolved to meet the needs of security teams, showing strong interest in their data tools and features. Users can now map their security tools to important frameworks like NIST CSF.
They are using AI to streamline data collection and analysis, which makes understanding cybersecurity products faster and easier. This change has made their tools more appealing to companies and consultants alike.

Revenue Operations Pros Are Work Horses

beyondrevenueoperations • 39 implied HN points • 12 Oct 24

💼 Business Sales Strategy Data Management Process Optimization Team Collaboration

Revenue Operations focuses on aligning sales, marketing, and customer support to boost overall revenue. This means all teams need to work together to improve the customer experience.
Data accuracy and management are crucial in Revenue Operations. Keeping customer data clean helps everyone make better decisions and understand what drives sales.
Ongoing support and training empower teams to succeed. Providing the right tools and resources ensures that all revenue-generating teams can perform at their best.

Data's final format

Data People Etc. • 391 implied HN points • 09 Dec 24

🕹 Technology Data Management Cloud Computing Software Development Tech Innovation

Apache Iceberg™ is a popular way to manage data, offering features like scalability and openness. However, using it can feel complicated and less exciting than expected.
CSV format is an easy and humble way to manage data, requiring no special knowledge or complex setups. It’s simple and widely understood, making it a go-to choice for many.
The transformation of data management, like Iceberg™, is like building a transcontinental railroad. It's a huge effort aimed at improving the way we process and use information in the modern world.

Is It Time to Say Goodbye to Data Engineers?

SeattleDataGuy’s Newsletter • 812 implied HN points • 06 Feb 25

🕹 Technology Data Engineering Software Development Data Management Business Intelligence Analytics

Data engineers are often seen as roadblocks, but cutting them out can lead to major problems later on. Without them, the data can become messy and unmanageable.
Initially, removing data engineers may seem like a win because things move quickly. However, this speed can cause chaos as data quality suffers and standards break down.
A solid data strategy needs structure and governance. Rushing without proper planning can lead to a situation where everything collapses under the weight of disorganization.

Hype cycles

Bite code! • 10520 implied HN points • 24 Jun 23

🕹 Technology Programming Web Development Software Architecture Data Management Cloud Computing

XML was once believed to be the future, but turned out to create technical debt instead.
Following every hype blindly in technology can lead to failed projects and waste of money.
Using the right tool for the right job is crucial in software development, avoiding unnecessary complexity and costs.

Podcast guest appearance

Minimal Modeling • 202 implied HN points • 23 Dec 24

🕹 Technology Software Development Database Design Data Management Programming

The podcast discussed database design and Minimal Modeling for almost two hours. It shared valuable insights on how to create better database structures.
The speaker is open to appearing on other podcasts and is willing to talk about topics like data documentation and software development processes.
There's a recent podcast episode available, but it is in Russian, limiting its audience. If you need help with databases, the speaker is approachable.

Postgres in a box

benn.substack • 920 implied HN points • 06 Dec 24

🕹 Technology Databases Cloud Computing Software AI Data Management

Software has changed from being sold in boxes in stores to being bought as subscriptions online. This makes it easier and cheaper for businesses to manage.
The new trend is separating storage from computing in databases. This lets companies save money by only paying for the data they actually use and the calculations they perform.
There's a push towards making data from different sources easily accessible, so you can use various tools without being trapped in one system. This could streamline how businesses work with their data.

The History and Evolution of Open Table Formats - Part II

Practical Data Engineering Substack • 79 implied HN points • 18 Aug 24

🕹 Technology Data Management Software Development Open Source Cloud Computing Database Systems

The evolution of open table formats has improved how we manage data by introducing log-oriented designs. These designs help us keep track of data changes and make data management more efficient.
Modern open table formats like Apache Hudi and Delta Lake offer database-like features on data lakes, ensuring data integrity and allowing for easier updates and querying.
New projects are working on creating a unified table format that can work with different technologies. This means that in the future, switching between data formats could be simpler and more streamlined.

Odds and Ends #53: Now TfL is betting on AI cameras to improve safety

Odds and Ends of History • 469 implied HN points • 20 Jan 25

🕹 Technology AI Public Policy Transportation Data Management Governance

Transport for London is planning to use AI cameras to make transportation safer. This technology aims to enhance safety measures in public transport.
A discussion is taking place about how AI could help improve government services. Experts want to focus on real solutions rather than just hype or negativity.
There are concerns about why governments might be hesitant to take action. Some believe that fear of power is stopping them from making necessary changes.

Issue #12 – The Three Biggest Data Problems Companies Face

The Data Ecosystem • 239 implied HN points • 30 Jun 24

💼 Business Data Management Organizational Structure Process Improvement Data Quality Technology Integration

Companies often struggle with a data operating model that doesn't connect well with their other teams. This leads to isolation among data specialists, making it hard to work effectively.
Data models, which are important for understanding and using data correctly, are often overlooked. When organizations don’t reference these models, they can drift further away from their goals.
Many data quality issues come from deeper problems within the organization, like poor data governance and inconsistent processes. Fixing just the visible data quality issues won't solve the bigger problems.

Data Models, Types, or Schemas?

The API Changelog • 4 implied HN points • 14 Feb 25

🕹 Technology APIs Data Models Software Development Programming Data Management

Naming things is tough, especially when it comes to defining API data. Different people use different terms like data model, data type, or schema, which can lead to confusion.
A data model helps to represent and organize information, while a data type defines the kind of data values it can hold. However, people often associate data types with simple categories like strings and numbers.
The term 'schema' is commonly used to describe the structure and format of API data. Many standards, like OpenAPI and GraphQL, reference schemas to clarify how to define input and output data.

Faster computers afford dumber solutions

Wednesday Wisdom • 104 implied HN points • 18 Dec 24

🕹 Technology Software Development Systems Architecture Data Management Machine Learning

Faster computers let us use simpler solutions instead of complicated ones. This means we can solve problems more easily, without all the stress of complex systems.
In the past, computers were so slow that we had to be very clever to get things done. Now, with stronger machines, we can just get the job done without excessive tweaking.
Sometimes, when faced with a problem, it's worth it to think about simpler approaches. These 'dumb' solutions can often work just as well for many situations.

All I want in AI is some context and a chat window

Pedram's Data Based • 20 implied HN points • 22 May 25

🕹 Technology AI Software Interfaces Automation Data Management

Having a simple chat interface makes it easy for non-technical people to use AI tools. This helps in accessing valuable resources without needing complex setups.
Providing relevant context is crucial for the effectiveness of AI. When the right information is fed to AI, it can give much better and accurate responses.
Integrating tools and data sources can improve AI's capabilities but remains a challenge. Companies need better systems to pull together all the necessary information for their teams.

Do we need the Lakehouse architecture?

VuTrinh. • 399 implied HN points • 20 Apr 24

🕹 Technology Data architecture Data Management Machine Learning Analytics

Lakehouse architecture combines the strengths of data lakes and data warehouses. It aims to solve the problems that arise from keeping these two systems separate.
This new approach allows for better data management, including features like ACID transactions and efficient querying of big datasets. It enables real-time analytics on raw data without needing complex data movements.
With the help of technologies like Delta Lake and similar systems, the Lakehouse can handle both structured and unstructured data efficiently, making it a promising solution for modern data needs.

Where AI wins.

The Uncertainty Mindset (soon to become tbd) • 199 implied HN points • 12 Jun 24

🕹 Technology AI Ethics Data Management Human-Machine Interaction Machine Learning

AI is great at handling large amounts of data, analyzing it, and following specific rules. This is because it can process things faster and more consistently than humans.
However, AI systems can't make meaning on their own; they need humans to help interpret complex data and decide what's important.
The best use of AI is when it works alongside humans, each doing what they do best. This way, we can create workflows that are safe and effective.

27 Unique Dev Challenges: A Recent Study Explored the Top Challenges Faced by LLM Developers

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 20 Aug 24

🕹 Technology AI Development API Integration Data Management Security Issues

Developers face many challenges when working with large language models (LLMs), including issues with API calls and integrating them into existing systems.
Common problems also involve managing large datasets and ensuring data privacy and security while using LLMs for tasks like text generation.
Understanding unpredictable outputs from LLMs is essential, as it affects the reliability and performance of applications built with these models.

Issue #10 - The Data Lifecycle

The Data Ecosystem • 159 implied HN points • 16 Jun 24

🕹 Technology Data science Data Management Data security Data Analysis Data Engineering Data Visualization

The data lifecycle includes all the steps from when data is created until it is no longer needed. This helps organizations understand how to manage and use their data effectively.
Different people and companies might describe the data lifecycle in slightly different ways, which can be confusing. It's important to have a clear understanding of what each term means in context.
Properly managing data involves stages like storage, analysis, and even disposal or archiving. This ensures data remains useful and complies with regulations.

Tech the Heck? - CloudTruth

Cloud Irregular • 2069 implied HN points • 19 Feb 24

🕹 Technology Tech news Software Data Management Product Marketing Startups

Explaining complex tech products in simple language is important for understanding and adoption.
Developers may value different aspects of a tech product compared to business decision-makers, causing a mismatch in communication.
CloudTruth focuses on managing crucial configuration data, highlighting the importance of precision in language and clear communication.

Analytics Team: Strategic Partner or Service Org?

Elena's Growth Scoop • 1139 implied HN points • 30 Jun 23

💼 Business Data Management Company Culture Decision-making

Having a data-driven culture is important for making informed decisions and connecting actions to business outcomes.
In many companies, data is not well managed and can lead to frustration when trying to implement a data-driven culture too soon.
Striking a balance and ensuring data accuracy is crucial before pushing for a data-driven culture.

Issue #9 - Clarifying Data Terminology

The Data Ecosystem • 159 implied HN points • 09 Jun 24

🕹 Technology Data Management Data Analysis Data Governance Data Literacy Data science

Data can mean many things, from raw collections to curated evidence used in decisions. It's important to define what data means in each situation to avoid confusion.
Poorly defined data terms can lead to problems in data literacy, collection, and management. This can create issues for organizations trying to use data effectively.
Understanding different categories of data, like data types and processing stages, helps in managing and analyzing data better. Knowing these categories makes it easier to communicate and use data in an organization.

Analyzing "Sorting a million 32-bit integers in 2MB of RAM using Python"

Bite code! • 1223 implied HN points • 27 Jan 24

🕹 Technology Programming Python Software Development Algorithms Data Management

The article discusses sorting a million 32-bit integers in 2MB of RAM using Python.
The approach uses rarely used modules like struct, array, and heapq, along with generators.
The solution involves a two-phase multiway merge sort that optimizes memory usage.

sqlmesh plan

davidj.substack • 59 implied HN points • 10 Dec 24

🕹 Technology Software Data Management Cloud Computing Analytics Development

Virtual data environments in SQLMesh let you test changes without affecting the main data. This means you can quickly see how something would work before actually doing it.
Using snapshots, you can create different versions of data models easily. Each version is linked to a unique fingerprint, so they don't mess with each other.
Creating and managing development environments is much easier now. With just a command, you can set up a new environment that looks just like production, making development smoother.

My Definition of Data Modeling (for today)

Joe Reis • 530 implied HN points • 20 Jan 24

🕹 Technology Data Modeling AI Venture Capital Data Management

Data modeling has various definitions by different experts and serves to improve communication, provide utility, and solve problems.
A data model is a structured representation that organizes data for both humans and machines to inform decision-making and facilitate actions.
Data modeling is evolving to consider the needs of machines, different use cases, and a wider range of modeling approaches for various situations.

Issue #1 - We Need to Rethink Data

The Data Ecosystem • 259 implied HN points • 13 Apr 24

🕹 Technology Data Management Information Systems Data Strategy Business Intelligence Analytics

The data industry is really complicated and often misunderstood. People usually talk about symptoms, like bad data quality, instead of getting to the real problems underneath.
It's important to see the entire data ecosystem as connected, not just as separate parts. Understanding how these parts work together can help us find new opportunities and improve how we use data.
This newsletter aims to break down complex data topics into simple ideas. It's like a cheat sheet for everything related to data, helping readers understand what each part is and why it matters.

In the race between humans and AIs, Data Governance might be the only lever we can pull, to charge for a fair win against the machines

The Diary of a #DataCitizen • 19 implied HN points • 28 Aug 24

🕹 Technology AI Governance Data Management Ethics Human Impact Technology Trends

Data governance is important for keeping technology human-friendly. It helps us make sure that tech doesn't take over our lives.
The rise of AI has changed the game, making data and AI governance even more crucial. We need to focus on using technology in ways that benefit everyone.
Good tech creates real value for people. It's about how well technology works for the users, not just its shiny features or capabilities.

sqlmesh model kinds - 2

davidj.substack • 47 implied HN points • 09 Dec 24

🕹 Technology Data Management Software Development Data Warehousing APIs

There are three types of incremental models in sqlmesh: Incremental by Partition, Unique Key, and Time Range. Each type has its own unique method for handling how data updates are processed.
Incremental models can efficiently replace old data with new data, and sqlmesh offers better state management compared to other tools like dbt. This allows for smoother updates without the need for full-refresh.
Understanding how to set up these models can save time and resources. Properly configuring them allows for collaboration and clarity in data management, which is especially useful in larger teams.

Learning from the Past: Comparing the Hype Cycles of Big Data and GenAI

Gradient Flow • 159 implied HN points • 02 May 24

🕹 Technology Artificial Intelligence Data Management

Adopt a measured approach to GenAI implementation by learning from past technology hype cycles like Big Data.
Organizations should clearly define business problems before adopting GenAI to avoid misalignment and wasted resources.
In navigating the GenAI landscape, prioritize data quality, governance, talent investment, and leveraging open-source solutions for successful adoption.

Best Practices in Retrieval Augmented Generation

Gradient Flow • 599 implied HN points • 19 Oct 23

🕹 Technology AI Data Management Information Retrieval LLM

Retrieval Augmented Generation (RAG) enhances language models by integrating external knowledge sources for more accurate responses.
Evaluating RAG systems requires meticulous component-wise and end-to-end assessments, with metrics like Retrieval_Score and Quality_Score being crucial.
Data quality is pivotal for RAG systems as it directly impacts the accuracy and informativeness of the generated responses.

What I Learned This Week #6

Eventually Consistent • 59 implied HN points • 01 Jul 24

🕹 Technology Data Management Concurrency Cloud Computing

Data partitioning helps manage query loads by distributing large datasets across multiple disks and processors. Considerations include rebalancing for even distribution, distributed query execution, and dealing with hot spots.
Partitioning secondary indexes can be done locally or globally, with tradeoffs between keeping related data together versus faster lookups for certain queries. Routing queries in distributed systems may use coordination services or gossip protocols for efficiency.
Transactions provide a way to manage concurrency and software failures by ensuring operations either fully succeed or fully fail. AWS Lambda uses worker models for task execution and Rust Atomics for memory ordering control across threads.

Introduction To Event-Based Analytics - Issue 142

Data Analysis Journal • 569 implied HN points • 03 May 23

🕹 Technology Data Analytics Product Analytics Data Management

Event-based analytics is crucial for understanding user behavior and product performance.
Session-based analytics focus on website traffic while event-based analytics track user interactions like clicks and actions.
Implementing and maintaining event-based analytics can be challenging due to issues with data integration and interpretation.