The hottest Data Substack posts right now

And their main takeaways
Category
Top Literature Topics
benn.substack 5830 implied HN points 06 Mar 26
  1. Our phones and apps already record almost everything we do, and that data is collected and sold across companies and marketplaces.
  2. Privacy has mostly depended on the annoying difficulty of combining messy logs, so ordinary lives stayed unexamined because it was a pain to do so.
  3. AI automates the grunt work of stitching together those logs, making it trivially easy for governments, companies, or anyone with access to buy or assemble detailed profiles at scale.
BIG by Matt Stoller 28534 implied HN points 17 Feb 26
  1. The idea that current AI is a godlike, sentient force is mostly hype and a marketing push to grab money, resources, and political protection.
  2. Big tech is racing to build personal AI agents that will control data and commerce. Without rules forcing those agents to act for users, companies can manipulate people and set prices to their advantage.
  3. AI is already being used to cut jobs, hike costs, and steal likenesses, so democratic regulation—like fiduciary duties for agents, limits on ad‑funding, and stronger copyright protections—is needed to protect people and markets.
Chartbook 515 implied HN points 09 Mar 26
  1. India’s trade deficit is largely shaped by oil imports, with the rest of goods adding to the shortfall.
  2. Chocolate production has a significant CO2 footprint, showing that everyday foods can carry meaningful environmental costs.
  3. The network of US military bases in Italy is a notable strategic and political factor, influencing both regional geopolitics and domestic debates.
SemiAnalysis 15456 implied HN points 06 Jan 26
  1. Scaling reinforcement learning (post‑training) is the main engine of recent capability and utility gains, with labs pouring compute into RL and using broad real‑world evals like GDPval to measure progress.
  2. Building RL environments and datasets is a large, specialized industry — firms clone UIs, create coding and software gyms, and hire domain experts to write tasks and rubrics, spawning many vendors and "RL as a service" offerings.
  3. Applying RL to science and biology requires closed‑loop physical experiments and robotics, faces long costly rollouts and sparse rewards, and will push models and labs toward specialized, non‑commodified solutions.
Kerman Kohli 99 implied HN points 29 Oct 24
  1. RPC calls to blockchain nodes only succeed about 78.5% of the time on average. This means that sometimes you might have trouble getting the data you need.
  2. The performance of nodes varies depending on the blockchain you’re accessing, the RPC provider you choose, and even the time of day you make your requests.
  3. To ensure better reliability, it’s smart to use multiple node providers rather than depending on just one. This way, if one fails, you have a backup.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
COVID Reason 832 implied HN points 16 Oct 24
  1. The FBI initially reported a drop in violent crime for 2022, but later revised the numbers to show a significant increase, changing the narrative without much public notice.
  2. Revisions included thousands more cases of serious crimes, raising questions about the accuracy and transparency of the FBI's data.
  3. Many crimes go unreported, leading to incomplete data and a lack of trust in official crime statistics, which affects public safety understanding.
Dana Blankenhorn: Facing the Future 39 implied HN points 30 Oct 24
  1. Nvidia's rise marked the start of the AI boom, with companies heavily buying chips for AI tools. This growth continues, and Nvidia is now a leading company.
  2. Google's cloud revenue is growing quickly at 35%, while overall revenue growth is slower at 15%. This shows strong demand for AI services from Google.
  3. Despite revenue growth, Google's search revenue isn't doing as well, rising only 12%. This could mean they are losing some of their search market share.
Knowingless 1836 implied HN points 26 Feb 26
  1. An interactive site lets you explore a massive fetish survey of about 960,000 people and ~900 questions by picking x/y axes, filtering by demographics, and choosing weighted or unweighted views.
  2. The site includes a search, a question generator, tools to show random or statistically significant correlations, and a summary that displays exact survey wording, with some chart types still being improved.
  3. Early explorations already surface notable patterns—age-linked trends, apparent gender confounds in reported partner counts, low neuroticism predicting enjoyment of sex work, and subs reporting more interest in violent porn—so it can help people find new, testable correlations.
Odds and Ends of History 670 implied HN points 12 Mar 26
  1. A featured podcast episode covers opening NHS data for scientific research and explains how the Net Zero transition makes electricity pricing much more complicated.
  2. Coverage mixes politics and tech, with pieces on what the collapse of communism teaches the abundance movement, analysis of Labour’s 'hero voters', and tech stories like a possible EV charging/battery breakthrough plus a sharp takedown of a bad AI argument.
  3. There’s a short take on Britain’s Eurovision entry and its chances, and longer essay content is behind a subscription (a 7‑day free trial is offered), though the planned essay has been delayed by illness.
Generating Conversation 186 implied HN points 12 Mar 26
  1. Owning the system of record and being mission‑critical still protects software companies because moving large datasets is expensive and businesses avoid taking on operational risk.
  2. Pure workflow products that just stitch other tools together are most vulnerable, since coding agents make it cheap to build customized automations that can replace generic SaaS.
  3. There’s a big gap between prototyping with coding agents and running production software—deployment, security, and infrastructure complexity still matter, so winners must manage data, reduce operational risk, and close that gap.
Knowingless 1364 implied HN points 12 Feb 26
  1. A very large fetish-survey dataset (about 970,000 responses) has been released along with metadata and survey structure so others can explore and analyze it.
  2. The public release was heavily anonymized and downsampled into a representative subset: many demographic fields were binned or removed and multiple layers of noise were added, so correlations remain but are generally reduced by roughly 15–30%.
  3. The sample is limited to ages 14–32 from Western countries, some extreme fetish items were removed, and there may still be occasional cleaning errors, so verify any surprising findings before drawing strong conclusions.
Big Technology 8006 implied HN points 21 Nov 25
  1. Google made a strong comeback in 2025 after a rough start with AI, focusing on improving their models and products. This change led to a significant increase in stock value and market confidence.
  2. A major part of Google's success came from centralizing its AI research and development under Google DeepMind, which allowed for better collaboration and faster decision-making in product development.
  3. The company's search and cloud divisions also grew significantly, with increased revenue and innovation in AI products, showing that Google can still compete effectively in the evolving tech landscape.
High ROI Data Science 615 implied HN points 06 Oct 24
  1. Many businesses love the idea of AI but find it hard to put into practice. It often looks easy on paper, but the reality is very different when trying to make it work.
  2. Data is really important for AI to work well. Companies need good data to build effective AI products, and often, they realize this too late after facing challenges.
  3. AI projects often fail because businesses don’t fully understand what they need to achieve. Companies should focus on solving real problems rather than just using the latest technology.
Stealing Signals 599 implied HN points 03 Oct 24
  1. Routes data is really important for understanding how well players are performing. Different sources measure these routes in different ways, which can create confusion.
  2. The NFL has started providing its own routes data, which could help standardize how we analyze player performance. This might make comparisons easier and clearer moving forward.
  3. Stats like TPRR (Targets Per Route Run) help us understand player efficiency, but they need to be used alongside other context like player roles and QB performance for better insights.
SatPost by Trung Phan 631 implied HN points 13 Feb 26
  1. Big SaaS companies need large teams because they run mission-critical, globally regulated systems at huge scale, so they require lots of sales, support, engineering, security, and legal staff to ensure uptime, compliance, and customer integrations.
  2. AI coding agents will automate much of code production and shift value toward product taste, orchestration, proprietary data, and reliability/security expertise, forcing companies to rethink roles and org structure.
  3. Software demand won’t vanish — AI will create more software but change who captures the value, pressuring per-seat pricing and pushing SaaS firms to become systems of record or adopt usage- and outcome-based models to stay defensible.
Tanay’s Newsletter 113 implied HN points 03 Mar 26
  1. AI erodes labor-based moats like switching costs, application-layer scale, and generic process advantages, making it cheaper and faster to build features, migrate systems, and iterate.
  2. Defensibility shifts to hard-to-reproduce assets: proprietary first-party data, real marketplace liquidity and reputation, regulatory or physical rails, and unique processes that rely on exclusive signals.
  3. Some powers strengthen or split — model and infrastructure scale plus institutional trust grow in importance, while marketing-driven consumer brand shortcuts weaken as agents can deeply evaluate options.
Marcus on AI 11106 implied HN points 07 Aug 25
  1. GPT-5 has been released, but it hasn't made as big an impact as many expected. It's good but not revolutionary.
  2. While some improvements have been made, GPT-5 is still seen as part of the group rather than a major leader in AI.
  3. There are concerns about the accuracy of the data shared during its launch, which raises questions about its real-world performance.
The Fry Corner 186 HN points 15 Sep 24
  1. AI can change our world significantly, but we must handle it carefully to avoid negative outcomes. It's crucial to put rules in place for how AI is developed and used.
  2. Humans and AI have different strengths; machines can process data faster, but humans have emotions and creativity that machines can't replicate. We shouldn't be too quick to believe AI can think like us.
  3. The growth of AI might disrupt many industries and change how we live. We need to be aware of these changes and adapt, ensuring that technology serves humanity rather than harms it.
Marcus on AI 11264 implied HN points 21 Jun 25
  1. Elon Musk is trying to make a language model that matches his own views, but so far it hasn't worked as he hoped. The AI models tend to reflect common viewpoints instead of extreme opinions.
  2. Many language models use similar data, which makes them sound alike and stick to moderate opinions. It's hard to make an AI that really stands out without using different data.
  3. Musk's plan to rewrite information to fit his beliefs is concerning. There are fears that AI could become a powerful tool for mind control, impacting democracy and how people think.
Contemplations on the Tree of Woe 2239 implied HN points 21 Nov 25
  1. The U.S. sees AI as crucial to winning its power struggle against China. Investing in AI can help improve its military, economy, and technology.
  2. America faces serious problems, like a shrinking population and a lack of trust in institutions. Many think AI is the only way to revive the economy and society.
  3. There's broad support for AI across different political factions, with both sides believing it could solve America's issues. There seems to be no backup plan if AI fails.
benn.substack 1687 implied HN points 14 Nov 25
  1. Not knowing can mean different things. It can show disinterest, annoyance, or a humble uncertainty in conversations.
  2. Technology and AI are unpredictable, and the next big breakthrough can happen by chance, often in unexpected ways.
  3. To succeed in tech, it’s important to take action and build things, rather than just thinking about ideas. Typing and doing lead to real progress.
benn.substack 1508 implied HN points 21 Nov 25
  1. Building strong connections with various data sources is important for creating valuable AI products. This way, the product can understand context and provide better outcomes.
  2. Platforms may not be as essential as we think. Sometimes, focusing on being a good producer and providing unique intelligence can be more beneficial than trying to build a large platform.
  3. As AI tools evolve, they learn from each other. This means that context is not just about gathering data, but also about interpreting and using that data intelligently.
Don't Worry About the Vase 1926 implied HN points 13 Nov 25
  1. Everybody seems to agree that AI is important, but opinions vary on how to manage its growth and impact. Many believe we should keep humans in charge when dealing with powerful AI.
  2. There's a lot of skepticism around AI and its effects on jobs and life, with some believing it will cause major disruptions. Others think it will be a positive change overall.
  3. There's a sentiment that as AI becomes more prevalent, people need to be cautious and thoughtful about how it's integrated into daily life and big decisions, ensuring strong safeguards are in place.
Marcus on AI 14386 implied HN points 03 Feb 25
  1. Deep Research tools can quickly generate articles that sound scientific but might be full of errors. This can make it hard to trust information online.
  2. Many people may not check the facts from these AI-generated writings, leading to false information entering academic work. This could cause problems in important fields like medicine.
  3. As more of this low-quality content spreads, it could harm the credibility of scientific literature and complicate the peer review process.
Big Technology 13260 implied HN points 31 Jan 25
  1. OpenAI is focusing more on building apps rather than just creating AI models. This shift reflects a need to stay competitive and profitable in the changing AI landscape.
  2. The market for AI applications is growing, and OpenAI's ChatGPT is performing well, far ahead of its competitors in earnings. This positions OpenAI favorably as it continues to innovate its products.
  3. While OpenAI aims to develop artificial general intelligence, it faces challenges as competition increases and cost structures change in the AI industry. Staying ahead will require continuous product improvements.
Technically 31 implied HN points 12 Mar 26
  1. Kalshi handled about 203 million trades and roughly $41.7 billion in volume, generating about $545.6 million in trading fee revenue from those trades.
  2. Over 82% of the activity is sports (including parlays), so the platform functions a lot like a sportsbook even though users trade peer-to-peer and Kalshi also acts as a market participant and liquidity provider.
  3. Fees follow a formula tied to P*(1-P) (taker fee ≈ round up(0.07·C·P·(1-P)), maker fee ≈ 0.0175·C·P·(1-P)), which makes fees highest near 50% probability and lower at extreme odds, and resolution practices and regulatory treatment remain somewhat manual and unsettled.
OpenTheBooks Substack 199 implied HN points 07 Feb 26
  1. A new platform will combine a huge private government spending database with AI-indexed public officials’ remarks so people can compare what politicians say with what they do and spend.
  2. The tool uses pattern recognition and prediction to spot areas prone to waste, fraud, and abuse, aiming to help prevent scandals in real time.
  3. The project relies on massive scale—about 10 billion data points from OpenTheBooks—giving journalists, policymakers, and citizens unprecedented transparency and accountability tools.
Chartbook 543 implied HN points 31 Dec 25
  1. The economy is becoming K-shaped, with some sectors and people recovering strongly while others fall further behind.
  2. China shows an east–west split where a new data-and-energy economy is concentrating growth in some regions while others lag.
  3. A cultural reflection on 'mourning a hoplite' uses classical imagery to explore themes of loss, memory, and changing identity.
General Robots 732 implied HN points 16 Dec 25
  1. They scale teleoperation data collection by sending thousands of gloves to people’s homes, with 500+ active collectors, which gives much more diverse and easily scalable data than robot farms.
  2. The robot design prioritizes safety and reach — back-drivable limbs and a low tipping hazard combined with a 2.13 m workspace and the ability to lift 6 kg at about an 80 cm reach.
  3. Simple, well-engineered hands (two fingers with two DOFs and a fixed thumb) deliver versatile, precise grasps in real tasks like table clearing and making espresso, though live demos can still trigger occasional failure modes.
The Data Ecosystem 339 implied HN points 04 Aug 24
  1. The People, Process, Technology framework helps organizations balance these three key areas but often misses the importance of data. Companies should not just focus on technology but also consider how people and processes interact.
  2. A new framework that includes data is called People, Process, Technology & Data. This approach shows how these four components work together, helping organizations make better decisions and manage change more effectively.
  3. Using structured questions and understanding the roles of each component can enhance planning and execution in businesses. It's essential to revisit these elements regularly to stay aligned with goals and adapt as needed.
Jacob’s Tech Tavern 3280 implied HN points 30 Jun 25
  1. Data is essential for making applications work smoothly, acting like the oil in a machine. Without it, everything would grind to a halt.
  2. The Foundation library has been around for a long time, helping with things like data management and networking. It's getting a modern upgrade to work better across different platforms.
  3. Understanding how Data is built in the swift-foundation gives insights into its importance and functionality in coding. It's crucial for developers to know how it works under the hood.
Marcus on AI 8378 implied HN points 22 Dec 24
  1. Many experts feel that the recent test called ARC-AGI should not have been labeled as such. It wasn't a proper test for Artificial General Intelligence.
  2. The presentation was confusing and didn't clearly show what the AI was tested on. This left people with the impression that the AI performed better than it actually did.
  3. There's a need for more scientific scrutiny of the results. Until we get that, we can't really compare the AI's performance fairly with humans.
Nicolas Bustamante 132 implied HN points 04 Feb 26
  1. LLM chat interfaces are replacing specialized software UIs, so the interface moat that once locked in users is disappearing.
  2. With interfaces commoditized, competition becomes API vs API and only truly proprietary, non-replicable data keeps pricing power; if data can be licensed or scraped, margins and retention will collapse.
  3. Winners will be LLM/chat owners, proprietary data holders, and API-first startups, while interface-dependent vertical software, many UX-focused firms, and aggregators who don’t control the chat layer are at risk.
Odds and Ends of History 670 implied HN points 27 Nov 25
  1. The Budget outlines the government's economic strategy and priorities for the country. It's a critical event that influences the political landscape.
  2. There are both positive and negative aspects to the Budget, reflecting a mix of good and bad policy decisions. This is similar to how we see different stories unfold in a TV show.
  3. The discussion around the Budget also hints at its impact on individual political careers, particularly for certain politicians.
The Garden of Forking Paths 2869 implied HN points 10 Jan 24
  1. The internet largely runs through undersea cables spanning about 900,000 miles, connecting the world in a hidden network.
  2. Early undersea cables were made possible by materials like gutta-percha and played a key role in rapid communication during events like the US Civil War.
  3. Specialized ships lay and repair undersea cables made of fiber optics, and even guard against threats like sharks and sabotage by SCUBA divers.
Marcus on AI 7074 implied HN points 28 Nov 24
  1. ChatGPT has been popular for two years, but many of the initial uses people expected, like taking over Google, haven't happened. Companies are not as impressed with its real-world results.
  2. Despite promises of improvement, ChatGPT still struggles with inaccuracies and generating false information. Users continue to experience 'hallucinations' where the AI makes things up.
  3. The investment in AI is huge, but the fundamental issues with reliability and factual accuracy haven't improved significantly. There's a call for new approaches to make AI more trustworthy.
Net Interest 39 implied HN points 20 Feb 26
  1. AI coding assistants let non-technical people automate tasks such as indexing archives and getting daily idea suggestions by learning from their past content. They still can't fully surface private experiences or write in someone's exact voice.
  2. AI adoption in finance is still limited, with many analysts barely using generative tools, but early adopters report meaningful productivity gains—around 20% time saved—and are building AI-first cultures.
  3. AI is changing how market data is accessed and could weaken incumbents' competitive moats as firms and individuals build custom tools to replace traditional terminals. Data providers need to reposition themselves to stay relevant in an AI-first world.
System Design Classroom 659 implied HN points 01 Jun 24
  1. The type of caching strategy you choose depends on your read and write ratios. If you read a lot, caching is very helpful, but if you write often, you need a more complex approach.
  2. Data consistency is crucial for some applications. Using methods like Write-Through helps keep data in cache and databases aligned, while other methods, like Write-Behind, prioritize speed over immediate consistency.
  3. To see if your caching is effective, you should track metrics like how many times data is successfully retrieved from the cache versus not retrieved. This will help you understand how well your caching is working.
Marcus on AI 7153 implied HN points 10 Nov 24
  1. The belief that more scaling in AI will always lead to better results might be fading. It's thought we might have reached a limit where simply adding more data and computing power is no longer effective.
  2. There are concerns that scaling laws, which have worked before, are just temporary trends, not true laws of nature. They don’t actually solve issues like AI making mistakes or hallucinations.
  3. If rumors are true about a major change in the AI landscape, it could lead to a significant loss of trust in these scaling approaches, similar to a bank run.
Marcus on AI 4703 implied HN points 09 Feb 25
  1. Large language models (LLMs) can make mistakes, sometimes creating false information that is hard to spot. This is a recurring issue that has not been fully addressed over the years.
  2. Google has been called out for its ongoing issues with LLMs failing to provide accurate results, as these problems seem to occur regularly.
  3. The idea of rapid improvements in AI technology may be overhyped, as the same mistakes keep happening, indicating slower progress than expected.