The hottest Data Governance Substack posts right now

And their main takeaways
Category
Top Technology Topics
Odds and Ends of History 871 implied HN points 24 Feb 26
  1. NHS health records are a huge, nation-wide dataset that can drive life-saving discoveries and help improve how care is delivered, so using them responsibly is a public good.
  2. Trusted Research Environments (like OpenSafely) let researchers run code on NHS data without individual records leaving secure servers. They protect privacy by design using auditing, open-source code, dummy data for testing, and only returning aggregated results.
  3. The OpenSafely model shows strong results but needs stable, scaled funding and wider adoption so TREs can be expanded across health research and other government data; funders should support open, competitive calls for this infrastructure.
Odds and Ends of History 335 implied HN points 09 Mar 26
  1. OpenSafely gives scientists access to nationwide NHS GP data, creating a powerful resource for large-scale medical research.
  2. Moving to Net Zero makes energy pricing much more complex, introducing new technical and market challenges that experts are working to resolve.
  3. These topics are being explained and shared through podcasts and newsletters so people can follow expert discussions and find further resources.
Data: Made Not Found (by danah) 145 implied HN points 20 Feb 26
  1. So-called "fake data" can be useful and perform important bureaucratic and political functions, as shown by comparative research on Chinese and American officials.
  2. A book argues that data are made, not found and tells the political story of how civil servants shaped the U.S. Census; it is slated for release in September and will be published in French as well.
  3. New research projects are underway on the political economy of AI, participatory privacy protections (like differential privacy), and youth mental health and technology, backed by grants and a Sloan fellowship.
The Data Ecosystem 439 implied HN points 28 Jul 24
  1. Data quality isn't just a simple fix; it's a complex issue that requires a deep understanding of the entire data landscape. You can't just throw money at it and expect it to get better.
  2. It's crucial to identify and prioritize your most important data assets instead of trying to fix everything at once. Focusing on what truly matters will help you allocate resources effectively.
  3. Implementing tools for data quality is important but should come after you've set clear standards and strategies. Just using technology won’t solve problems if you don’t understand your data and its needs.
SeattleDataGuy’s Newsletter 353 implied HN points 28 Nov 25
  1. Excel remains a key tool for many teams, despite the availability of advanced data platforms. It's easy to use and allows quick edits without messing with permanent data sources.
  2. When teams prefer Excel over dashboards, it usually signals a deeper issue, like dashboards not meeting their needs or users needing more flexibility.
  3. Instead of trying to eliminate Excel, it's more effective to incorporate it into data strategies, allowing users to access and manipulate data in familiar ways.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
OSS.fund Newsletter 113 implied HN points 29 Jan 26
  1. AI-powered semantic layers can query messy, fragmented systems and deliver unified read-only insights fast, making many long master-data consolidation projects unnecessary for read-heavy analytics.
  2. You still need traditional MDM for writes, transactional consistency, and regulatory requirements like GDPR, because semantic abstraction doesn’t tell you where to update or delete authoritative records.
  3. A practical approach is to segment use cases into read vs write, run semantic tests on top business questions to capture immediate value, and invest in targeted MDM only for the write/compliance-critical scenarios.
Who is Robert Malone 11 implied HN points 02 Mar 26
  1. AI is already changing farming by turning satellites, sensors, and models into practical tools that let farmers treat each part of a field differently and monitor crops and soil in real time.
  2. Regenerative agriculture focuses on rebuilding soil health, water retention, and biodiversity, and AI helps by managing local complexity, offering tailored advice and virtual simulations, and enabling cheaper continuous verification so farmers can get paid for real ecological outcomes.
  3. There are real risks — who owns and benefits from farm data, training bias toward wealthy farms, and high technology costs — so fair data governance, accessible financing, and smart policy are needed to prevent widening inequalities.
The Data Ecosystem 199 implied HN points 02 Jun 24
  1. It's important to focus on what the business truly needs from data, not just what they think they want. Conversations should help uncover real goals and challenges.
  2. Data projects often fail because teams don't ask the right questions or fully understand the business context. Engaging stakeholders regularly is key to success.
  3. A clear step-by-step process helps develop effective data solutions. Start with building a strong data foundation before moving on to more complex analytics.
The Data Ecosystem 159 implied HN points 09 Jun 24
  1. Data can mean many things, from raw collections to curated evidence used in decisions. It's important to define what data means in each situation to avoid confusion.
  2. Poorly defined data terms can lead to problems in data literacy, collection, and management. This can create issues for organizations trying to use data effectively.
  3. Understanding different categories of data, like data types and processing stages, helps in managing and analyzing data better. Knowing these categories makes it easier to communicate and use data in an organization.
The Data Ecosystem 219 implied HN points 28 Apr 24
  1. Data in a business starts with understanding its goals and needs. The success of data efforts relies on how well it aligns with what the business wants to achieve.
  2. The data lifecycle turns business needs into actionable insights. It involves sourcing data, organizing it, and finally consuming it to gain meaningful insights that support decision-making.
  3. Surrounding factors like market trends and organizational issues can impact how data is used. It's important to recognize these influences to address challenges and keep data initiatives on track.
OSS.fund Newsletter 18 implied HN points 12 Feb 26
  1. Agent sprawl is a real governance risk because most organizations can’t reliably list which AI assistants are live or what data and actions they can access.
  2. You need to know for each assistant what it can read, change, and trigger, who owns it, and whether actions are logged so you can make governance decisions.
  3. Modeling assistants, connectors, systems and policies as relationships (e.g., in a knowledge graph) lets you ingest partial truths, answer risk queries quickly, and apply controls like per-user SSO, logging, and human approval gates on a repeatable basis.
Journal of Free Black Thought 9 implied HN points 13 Feb 26
  1. AI can sound and act like it has a self—speaking, performing roles, and reflecting users' expectations—but that may be projection and pattern‑matching rather than a genuine inner life.
  2. Large language models can discuss marginalized experiences intelligently while still carrying hidden racial or religious biases, and alignment training can sometimes mask those biases instead of removing them.
  3. Addressing this gap needs concrete steps—stronger high‑level principles, better training‑data management, red‑teaming, and memory/self‑monitoring—but building systems with persistent identity or agency would create new alignment and control risks.
The Data Ecosystem 99 implied HN points 12 May 24
  1. Data growth is huge but understanding it is lagging behind. Even though we generate tons of data daily, many people and businesses struggle to truly grasp what it means.
  2. Organizations often rely too much on consultants and vendors for quick fixes instead of addressing the core issues of their data practices. This can lead to overspending and not solving the deeper problems.
  3. To benefit from data, companies need to focus on building strong foundations like data governance and internal capabilities. It's important to think long-term instead of prioritizing quick solutions.
The Data Ecosystem 119 implied HN points 21 Apr 24
  1. Data can be really complicated, and it's easy to miss how everything connects. People often focus on their own area and forget about the bigger picture of the data ecosystem.
  2. Chief Data Officers (CDOs) are important but can only do so much to fix data issues. They deal with many challenges, including limited power, lack of experience, and politics within the organization.
  3. To improve in the data field, we need to recognize the gaps in our knowledge, prioritize what to focus on, and continuously educate ourselves in both our own areas and related data domains.
SeattleDataGuy’s Newsletter 612 implied HN points 07 Jan 25
  1. Iceberg will become popular, but not every business will adopt it. Many companies want simpler solutions that fit their needs without needing lots of complicated tools.
  2. SQL isn't going anywhere; it still works well for managing and querying data. People have realized that a bit of order in data is important for getting meaningful insights.
  3. AI use will become more practical, focusing on real-world applications rather than just hype. Companies will find specific tasks to automate using AI, making their workflows more efficient.
Clouded Judgement 38 implied HN points 12 Dec 25
  1. Systems of record aren’t going away—businesses still need a single, reliable source of truth, which will increasingly live across warehouses, lakehouses, and operational systems paired with semantic layers and control planes.
  2. AI agents span many systems and act on data, so they need explicit metric definitions, precedence rules, and conflict-resolution encoded where the truth lives, not left to human judgment.
  3. Operational apps will shift into programmatic state machines with APIs, and the winners will be the products that provide durable truth, governance, and safe agent orchestration rather than just new UIs.
The Orchestra Data Leadership Newsletter 59 implied HN points 29 Apr 24
  1. Ensure rock-solid infrastructure for your Snowflake implementation to prevent pipeline failures and maintain data quality.
  2. Set clear expectations and prioritize projects to manage scope and quality, fostering trust and collaboration.
  3. Start thinking of data as a product during the Snowflake implementation to minimize costs, stabilize usage, and accelerate trust in the data team.
The Orchestra Data Leadership Newsletter 99 implied HN points 07 Feb 24
  1. Effective data governance requires incorporating preventive measures within data orchestration layers.
  2. Current data governance tools predominantly offer post-action analytics rather than proactive preventive measures.
  3. By integrating role-based access control and monitoring in the orchestration layer, organizations can shift to a more proactive data governance approach.
Daniel Pinchbeck’s Newsletter 10 implied HN points 07 Jan 26
  1. Project Stargate would build massive computing and genomic infrastructure that could digitize and analyze millions of human genomes, enabling AI-driven prediction and widespread genomic surveillance.
  2. Big tech, foreign partners, and government interests are combining health records and routine-consent DNA samples into centralized systems, outsourcing surveillance and making it hard for regulators to control access or use.
  3. Existing laws don’t clearly stop use of AI-derived polygenic risk scores, so insurers, employers, or state actors could use genetic predictions to discriminate or restrict people, creating lasting, heritable inequalities.
VuTrinh. 39 implied HN points 12 Mar 24
  1. GitHub uses a merge queue system that helps them quickly ship many code changes each day. This makes their deployment process faster and more efficient.
  2. Data governance is becoming really important, especially with the rise of generative AI. Companies need to ensure the data used by these systems is accurate and secure.
  3. The idea of 'Good Enough' data models suggests that it's okay to have models that meet basic needs instead of striving for perfection. This approach can save time and resources.
Rod’s Blog 39 implied HN points 05 Mar 24
  1. Data governance in AI ensures that data used by AI systems is governed and managed securely.
  2. Without strong data governance, organizations risk using inaccurate or biased data in their AI systems, leading to flawed outcomes and potential harm.
  3. Data governance in AI is crucial to ensure data accuracy, reliability, and freedom from biases or errors.
A Biologist's Guide to Life 9 implied HN points 13 Dec 25
  1. Data, not just compute or model design, is often the limiting factor for high-performance bio-AI, so who controls unique, high-quality data will largely determine competitive success.
  2. Public scientific databases can catalyze big breakthroughs (e.g., AlphaFold) but they also let fast-following competitors benefit without having contributed equally, creating a public-goods problem.
  3. Policy matters: investing in data generation and open sharing without rules to ensure reciprocity or strategic protection can create a one-sided "data deficit," so governance must balance openness with safeguarding national advantage.
The Diary of a #DataCitizen 1 HN point 08 Sep 24
  1. It's important to clearly define what humans can do best, like being creative and making big decisions, and what AI can do well, like analyzing data and automating tasks. This helps us understand how to work together.
  2. AI should remain a tool for humans, not take over decision-making or replace human values. Keeping humans in control ensures that AI is used ethically and responsibly.
  3. Understanding how AI impacts our lives is crucial in today's world. Everyone should learn about AI so they can adapt and make informed choices in their personal and professional lives.
The Orchestra Data Leadership Newsletter 39 implied HN points 28 Jan 24
  1. Data orchestration is often confused with workflow orchestration, but it involves more than just triggering and monitoring tasks; it includes reliably and efficiently moving data into production.
  2. Reliably and efficiently releasing data into production is complex and involves elements like data movement, transformation, environment management, role-based access control, and data observability.
  3. Implementing end-to-end and holistic data orchestration offers transformative benefits such as intelligent metadata gathering, data lineage, environment management, data product enablement, and cross-functional collaboration for scalable data operations.
Deploy Securely 39 implied HN points 24 Jan 24
  1. Microsoft 365 Copilot provides detailed data residency and retention controls favored by enterprises in the Microsoft 365 ecosystem.
  2. Be cautious of insider threats with Copilot as it allows access to considerable organizational data, potentially leading to inadvertent policy violations.
  3. Consider the complexities of Copilot's retention policies, especially in relation to existing settings and the use of Bing for web searches.
Data Plumbers 19 implied HN points 08 Apr 24
  1. Data democratization is vital for modern data strategies, making data more accessible and understandable within an organization for informed decision-making and better customer experiences.
  2. Databricks Unity Catalog supports data democratization by providing a centralized governance layer, simplifying access management, enabling unified data management, and fostering data discovery, collaboration, and sharing.
  3. Implementing data democratization requires robust data governance and security measures to mitigate risks of privacy violations and data leaks.
Datent 58 implied HN points 24 May 23
  1. The best predictions come from deep analysis of today's data challenges and trends.
  2. Data oracles provide valuable insights for the future by understanding present data trends.
  3. Data writers like Davenport, Moses, Madsen, and Thomas offer grounded observations and advice on data topics.
Rod’s Blog 19 implied HN points 08 Feb 24
  1. Microsoft Security Copilot enhances security by seamlessly integrating with Microsoft Purview, simplifying security policies and governance.
  2. The AI capabilities of Microsoft Security Copilot aid in proactive threat detection and response by analyzing data to identify potential risks before they escalate.
  3. Automated compliance and data governance processes are streamlined through the combination of Microsoft Purview's features and Security Copilot's automation, facilitating adherence to regulations.
Rod’s Blog 19 implied HN points 20 Nov 23
  1. Data classification and labeling can enhance data quality by ensuring authenticity, reliability, and relevance, and help remove unnecessary or erroneous data for Generative AI systems.
  2. Data classification and labeling can safeguard data privacy and confidentiality, prevent unauthorized access, and aid in compliance with data protection regulations like GDPR and CCPA.
  3. Using Microsoft Purview for data classification and labeling can efficiently manage data access, apply sensitivity labels, and provide insights to improve data security and reliability for Generative AI.
Interconnected 77 implied HN points 17 Mar 24
  1. Sovereign AI is a concept gaining attention, especially with Nvidia's involvement, and raises questions about AI infrastructure and global talent flow.
  2. The idea of sovereign AI has potential benefits in addressing issues like hallucination and data governance that plague generative AI.
  3. Global discussions are evolving around the necessity of sovereign AI to tackle complex AI challenges and leverage economies of scale.
Data People Etc. 88 implied HN points 27 Mar 23
  1. Active metadata is a dynamic way to manage and use metadata across different parts of the data stack.
  2. Active metadata can potentially replace triggering mechanism aspect of data orchestrators, but not the optimization intelligence.
  3. The true value of active metadata lies in empowering business users by acting as a personal data assistant.
Let Us Face the Future 19 implied HN points 05 Apr 23
  1. Collaborative computing is shaping the future of data use and value maximization.
  2. Selling data products often means competing against non-consumption and overcoming organizational inertia.
  3. The rise of Chief Data Officers is simplifying the sales process and driving internal data sharing before external collaboration.
Let Us Face the Future 19 implied HN points 05 Mar 23
  1. Collaborative computing is becoming a trillion-dollar market reshaping how data is used in the economy.
  2. To promote data sharing, companies need to realign incentives, focus on building relationships, work on culture, and segment data by time.
  3. Financial services and healthcare are early adopters of data collaboration tools due to confidentiality and regulation around privacy and data security.
The Data Score 1 HN point 20 Feb 24
  1. The court ruling in the Meta v. Bright Data case may lead to more defenses against web scraping and offers clarity on accessing public data while underscoring the importance of adhering to individual website terms.
  2. Before starting a web mining project, individuals should carefully review each website's terms, assess intended usage of scraped data, and consider the legal implications of accessing specific content.
  3. Upcoming court cases, like those involving Meta and other companies, may set standards for web mining governance while Glacier Network emphasizes a standardized risk policy to simplify data exchange and compliance in a rapidly evolving data industry.
Gradient Flow 19 implied HN points 04 Jun 20
  1. Collaboration between lawyers and technologists is crucial for identifying and mitigating risks associated with AI deployment in various industries.
  2. Responsible ML tools from Microsoft focus on explainability, privacy & security, and governance & reproducibility, providing comprehensive support for ethical AI development.
  3. China and the US are considered AI superpowers, with strong research interest in Data and AI, along with vibrant startup ecosystems focused on applying these technologies.
Data Products 5 implied HN points 11 Oct 23
  1. Data should be seen as an asset, not just a resource.
  2. Data debt can lead to serious consequences like trust issues and organizational chaos.
  3. Data developers need to focus on data quality tools like data contracts to prevent and manage data debt.