The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
LatchBio 41 implied HN points 26 Dec 25
  1. SpatialBench is a realistic suite of 146 verifiable spatial biology problems across five platforms and seven task types that recreates real analyst workspaces using snapshots of data and images.
  2. Current agent models perform poorly overall (roughly 20–38% accuracy) and vary widely by task and platform, and the choice of execution harness or wrapper can change outcomes as much as changing the base model.
  3. Inspecting agent trajectories reveals clear failure modes and productive strategies, showing that detailed traces help explain performance and that benchmarks like this are a practical first step toward engineering agents that can reliably automate spatial biology analysis.
The New Urban Order 119 implied HN points 01 May 24
  1. Close is an interactive map that helps people find neighborhoods with amenities important to them, like public schools, increasing personalized walkability.
  2. Close uses free spatial datasets and user feedback to build a detailed destinations roster, showing a commitment to accuracy and continuous improvement.
  3. Close differs from tools like Walkscore by focusing on transparency, user customization, and the 'time to furthest important destination' approach to assess walkability in cities.
SeattleDataGuy’s Newsletter 788 implied HN points 30 Nov 24
  1. Data teams should focus on projects that really matter to the business, not just completing tasks. It's important to pick work that makes a difference.
  2. Understanding how your business works is key to finding valuable projects. Ask questions about the data to see what's impacting your important metrics.
  3. Shift your mindset from being a regular team member to thinking like a business owner. This means taking initiative and seeking out projects that align with overall business goals.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Mindful Modeler 1018 implied HN points 20 Dec 22
  1. Model predictions should consider uncertainty to make informed decisions. Decisions relying only on point predictions can be risky.
  2. Conformal prediction is a method that can provide rigorous uncertainty scores, giving probabilistic guarantees of covering the true outcome.
  3. Conformal prediction is simple to apply, often with just 3 lines of code. It is model-agnostic, distribution-free, and comes with coverage guarantees.
Syncretica 471 implied HN points 05 Oct 23
  1. China's Strategic Petroleum Reserve size can be estimated through accounting methods
  2. China uses its SPR to manage oil imports and impact global oil markets
  3. China's high diesel demand raises questions about surplus and export quotas
Liberty’s Highlights 452 implied HN points 18 Oct 23
  1. It's liberating to realize that most fields are understandable to an interested outsider, focusing on big ideas.
  2. Exploring new fields and combining knowledge from different areas can lead to rich and interesting discoveries.
  3. Taking calculated risks and thorough preparation can lead to successful outcomes in business decisions, like pushing all the chips in.
Wyclif's Dust 2146 implied HN points 11 Nov 23
  1. Birth order and parental age influence outcomes in opposite ways.
  2. Within families, birth order and parental age have a high correlation.
  3. Even though birth order effects are big, they explain very little of the variation in outcomes.
Gradient Flow 559 implied HN points 04 May 23
  1. NLP pipelines are shifting to include large language models (LLMs) for accuracy and user-friendliness.
  2. Effective prompt engineering is crucial for crafting useful input prompts tailored to generative AI models.
  3. Future prompt engineering tools need to be interoperable, transparent, and capable of handling diverse data types for collaboration and model sharing.
Import AI 459 implied HN points 25 Sep 23
  1. China released open access language models trained on both English and Chinese data, emphasizing safety practices tailored to China's social context.
  2. Google and collaborators created a digital map of smells, pushing AI capabilities to not just recognize visual and audio data but also scents, opening new possibilities for exploration and understanding.
  3. An economist outlines possible societal impacts of AI advancement, predicting a future where superintelligence prompts dramatic changes in governance structures, requiring adaptability from liberal democracies.
benn.substack 843 implied HN points 18 Oct 24
  1. The way we value companies might be changing. Instead of just looking at numbers, people are considering things like hype and public interest.
  2. Being data-driven used to be seen as a key to success, but now it seems less effective for some businesses. There are successful examples, but many companies struggle to use data well.
  3. Cultural factors, or 'taste', are becoming more important in the business world than just relying on data. This shift might mean that how people feel about a company matters just as much as the finances.
Thái | Hacker | Kỹ sư tin tặc 1517 implied HN points 12 Jul 22
  1. Solving cybercrime cases during a pandemic can be challenging but rewarding, leading to new ideas and career advancements.
  2. Investigating cyber incidents requires thinking like a hacker to anticipate their next moves and gather crucial evidence.
  3. Learning from mistakes and conducting thorough investigations are crucial in cybersecurity to prevent future attacks and uncover hidden clues.
SeattleDataGuy’s Newsletter 612 implied HN points 07 Jan 25
  1. Iceberg will become popular, but not every business will adopt it. Many companies want simpler solutions that fit their needs without needing lots of complicated tools.
  2. SQL isn't going anywhere; it still works well for managing and querying data. People have realized that a bit of order in data is important for getting meaningful insights.
  3. AI use will become more practical, focusing on real-world applications rather than just hype. Companies will find specific tasks to automate using AI, making their workflows more efficient.
Category Pirates 452 implied HN points 15 Mar 23
  1. Category Science uses broader and weirder data analysis for business growth.
  2. Understanding customer outcomes drives the Net Promoter Score and business decisions.
  3. Top-performing content aligns with factors like hyper-targeted audience, clear outcomes, frameworks, practical applications, and effective marketing.
Data Science Weekly Newsletter 339 implied HN points 01 Dec 23
  1. Data science is evolving quickly, and it's important to stay updated with new advances and tools. Courses and reading lists can help you catch up and enhance your skills.
  2. Using machine learning to solve real-world problems, like correctly attributing quotes, shows the practical applications of data science. Collaboration between universities and organizations can lead to innovative solutions.
  3. The job market for data scientists is challenging right now. Many applicants are competing for limited positions, so if you're looking for a job, patience is key.
sebjenseb 196 implied HN points 10 Feb 24
  1. Assortative mating occurs between races, with individuals who date outside their race being more similar to each other in terms of intelligence, height, and risk-taking behaviors.
  2. Current literature suggests that interracial relationships may have a higher likelihood of ending or experiencing domestic violence issues, and mixed-race children might be more prone to mental/behavioral problems, possibly due to self-selection rather than social factors.
  3. Attractiveness was a weak predictor of interracial dating across all races, indicating that mate value or race exchanges based on mate value were not significant factors in interracial dating.
UX Psychology 396 implied HN points 26 May 23
  1. Qualitative data analysis involves examining non-numerical data, like interviews or observations, to find patterns and insights. This process requires a more nuanced approach compared to quantitative data analysis.
  2. Qualitative coding offers benefits like unveiling new insights, enhancing study validity, and providing contextual understanding of users' behaviors and motivations.
  3. There are different types of qualitative data analysis methods such as content analysis, thematic analysis, discourse analysis, and grounded theory. Choosing the right method depends on your research question, the type of data collected, and available resources.
American Inequality 393 implied HN points 07 Aug 23
  1. Alzheimer's is a major problem in the US, affecting millions and expected to double in the next 25 years.
  2. Inequality plays a significant role in Alzheimer's, with different communities and demographics being impacted differently.
  3. More focus is needed on training caregivers, analyzing data on minority communities, and educating about new drugs to address Alzheimer's inequalities.
Mindful Modeler 339 implied HN points 07 Nov 23
  1. Focus on creating an end-to-end pipeline first, experiment with simple models, and then scale up gradually for better results in machine learning challenges.
  2. Success in a challenge correlates with time invested, so choose challenges that motivate you and spend time understanding the data before committing.
  3. Adopt a strategy to pick challenges that interest you, prioritize an experimentation loop, and aim to optimize later for overall success.
Brad DeLong's Grasping Reality 176 implied HN points 24 Jul 25
  1. AI is reshaping jobs and how companies operate, especially in Silicon Valley where big players are fighting for profit. It's changing the game of technology investment and control.
  2. Investors need to carefully consider whether they're joining a genuine revolution or just chasing another tech bubble like cryptocurrency. Understanding the real nature of AI is crucial.
  3. AI is really about complex models that process information, not the magical intelligence people often hype it up to be. There’s a big difference between the promise of AI and what it can actually do right now.
Venture Prose 379 implied HN points 26 Sep 23
  1. Understanding your user behaviors over time through cohorts is crucial for business success.
  2. Clear frameworks and precise data help separate vanity from reality in business trends.
  3. Invest time and effort into mastering the basics of qualitative and quantitative feedback to increase long-term success.
Policy Tensor 373 implied HN points 29 Apr 23
  1. Extreme poverty statistics may not be reliable due to potential biases in measurement methods.
  2. Evidence indicates inconsistencies between poverty rates and key indicators like life expectancy, raising concerns about the accuracy of poverty data.
  3. The World Bank's numbers show discrepancies that suggest a need for further scrutiny and possible revision of poverty measurement techniques.
patternventures 198 implied HN points 16 Feb 24
  1. Venture capital is a great field for using data because it can really improve the investment process. By analyzing data, investors can more easily find and support promising startups.
  2. Some key performance indicators (KPIs) have been shown to correlate with the success of funds. For example, funds scoring above 30% on specific KPIs are much more likely to provide high returns.
  3. While data-driven strategies are helpful, they aren't perfect. Investors still need solid experience and networks to truly understand fund performance and secure access to the best opportunities.
Mindful Modeler 99 implied HN points 16 Apr 24
  1. Many COVID-19 classification models based on X-ray images during the pandemic were found to be ineffective due to various issues like overfitting and bias.
  2. Generalization in machine learning goes beyond just low test errors and involves understanding real-world complexities and data-generating processes.
  3. Generalization of insights from machine learning models to real-world phenomena and populations is a challenging process that requires careful consideration and assumptions.
Data Analysis Journal 353 implied HN points 22 Mar 23
  1. Analytics engineers bridge the gap between data engineers and data analysts by focusing on producing high-quality data.
  2. Analytics engineers use tools like dbt to streamline data modeling, testing, and documentation.
  3. Data quality is crucial in decision-making, making analytics engineering more important than ever.
Silver Bulletin 405 implied HN points 17 Feb 25
  1. The latest pollster ratings show which pollsters are most accurate and transparent based on their past performances. This helps understand which ones might do well in future elections.
  2. New data added to the ratings includes results from the 2024 presidential, congressional, and gubernatorial elections. Lots of new polls have shifted some ratings, but the top pollsters generally stayed the same.
  3. They measure pollster accuracy using different ratings and scores that consider factors like bias toward political parties and how close their predictions were to actual results.
LatchBio 54 implied HN points 13 Nov 25
  1. SpatialBench offers a set of 98 evaluation packs to measure how well spatial agents perform on real tasks, helping to compare different technologies effectively.
  2. The evaluations are designed from actual tasks scientists face, making them useful to assess real-world analysis abilities in biology.
  3. There's a need for specialized tools and resources in biology since standard coding methods don’t easily translate to biological analysis tasks.
bad cattitude 183 implied HN points 26 Jun 25
  1. There was a noticeable drop in birth rates after the rollout of COVID vaccinations, with data showing that vaccinated women had fewer births than unvaccinated women. This trend has raised many questions.
  2. A recent study highlighted the difference in conception rates between vaccinated and unvaccinated women, showing that unvaccinated women had significantly more births. However, the study also had limitations such as potential biases.
  3. Researchers suggest that looking into specific batches of the vaccine might help clarify the impact on birth rates, which could lead to more conclusive evidence about the vaccine's effects on pregnancy.
Oleg’s Substack 37 HN points 24 Jun 24
  1. AlphaFold 3 can predict how drug-like molecules bind to proteins better than existing programs without needing a 3D structure of the target.
  2. Data redundancy in scientific datasets can impact the performance and interpretation of machine learning models.
  3. AlphaFold 3's occasional missed obvious insights, like atoms overlapping, raises questions about its learning methods and performance.
In My Tribe 410 implied HN points 25 Jan 25
  1. Many experts believe that relying on government decisions can be inefficient because it often favors those with political power instead of addressing real needs.
  2. Inequality is a natural part of society, and efforts to eliminate it through government action can lead to problems, including promoting wokeness.
  3. Economic data can often be misleading due to measurement errors, making it hard to trust figures that inform important decisions like GDP or monetary policies.