The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
Marcus on AI 4663 implied HN points 24 Nov 24
  1. Scaling laws in AI aren't as reliable as people once thought. They're more like general ideas that can change, rather than hard rules.
  2. The new approach to scaling, which focuses on how long you train a model, can be costly and doesn't always work better for all problems.
  3. Instead of just trying to make existing models bigger or longer-lasting, the field needs fresh ideas and innovations to improve AI.
benn.substack 1150 implied HN points 01 Aug 25
  1. Automating analysis is tricky because we can't confirm if the results are accurate without understanding how they were made. This means we often have to trust the source instead of verifying the information ourselves.
  2. AI can create complex spreadsheets or charts but we can't easily check their correctness. Unlike other software, we can’t just test if a chart 'works' without digging deeper into its creation.
  3. In finance, companies are using strategies like buying crypto to boost their stock prices, even if these tactics seem irrational. This shows that sometimes getting attention matters more than the actual business fundamentals.
Franz likes to code 39 implied HN points 05 Sep 24
  1. If you're having trouble with the Google Trends Python package, you can switch to using Wikipedia's page view statistics instead. It's a reliable and official way to get data on search trends.
  2. Wikipedia provides a rich API that allows you to fetch daily or hourly view counts for specific articles. This can help analyze how topics gain interest over time.
  3. You can use a simple Python code to find the page views for any Wikipedia article, making it easy to replace Google Trends in your research and get the data you need.
Elizabeth Laraki 419 implied HN points 28 May 24
  1. Kerry Rodden, a UX researcher, helped YouTube understand how users navigated the site. By deeply analyzing user data, they found out what people really wanted from YouTube.
  2. One big surprise was that most YouTube sessions didn't start on the homepage. Instead, many users went directly to watch videos they found elsewhere on the internet.
  3. Kerry created clear visualizations of user data that showed how people moved through YouTube. This helped the company improve its homepage and focus on personalizing content for users.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
CalculatedRisk Newsletter 33 implied HN points 18 Feb 26
  1. Housing starts rose in December to a 1,404,000 seasonally adjusted annual rate, up 6.2% from November but 7.3% below December 2024.
  2. Building permits climbed to a 1,448,000 SAAR in December, improving month-to-month but still modestly below a year earlier; single-family activity was weaker while multi-family starts increased.
  3. For all of 2025, total starts were down 0.6% versus 2024, with single-family starts falling about 6.9% and multi-family rising roughly 18%, and housing units under construction remain elevated.
The Honest Broker Newsletter 2973 implied HN points 27 Jan 25
  1. In 2024, there were a lot of major hurricanes, tying with 2015 for the highest since records began, which raises questions about climate patterns.
  2. Despite the increase in hurricane landfalls, there hasn't been a clear trend showing that hurricanes are becoming more intense or frequent over time.
  3. Experts believe that while human activity may influence hurricanes, detecting these changes amidst natural variability is very challenging.
Astral Codex Ten 8534 implied HN points 05 Mar 24
  1. The Annual Forecasting Contest on astralcodexten.com involves participants making predictions about various questions, helping to determine if one identifiable genius or aggregated mathematical predictions work best for foreseeing the future.
  2. The winners of the contest were both amateurs and seasoned forecasting veterans, showcasing a mix of skill and luck in predicting outcomes.
  3. Metaculus outperformed prediction markets, superforecasters, and the wisdom of crowds in the contest, suggesting that consistent high performance might be rare but achievable with specific methods like those used by superforecaster Ezra Karger.
Democratizing Automation 902 implied HN points 07 Aug 25
  1. GPT-5 has been received with mixed feelings because it didn't fully meet the high expectations set before its launch. However, most users find it effective and beneficial.
  2. The upgrade in GPT-5 focuses on balancing performance, price, and user experience, making it one of the more affordable AI options.
  3. Progress in AI will continue, but it may be slower than some hope. The industry is shifting towards practical improvements over radical breakthroughs.
Implications, by Scott Belsky 1356 implied HN points 04 Jan 24
  1. The future will be personalized to your preferences, with digital experiences tailored to you.
  2. Local OS-native AI models will improve everyday life and redefine consumer AI, focusing on personalization, trust, and privacy.
  3. Small brands will become more competitive with big brands, AI will influence purchase decisions, and education will undergo a significant transformation.
Import AI 439 implied HN points 06 May 24
  1. People are skeptical of AI safety policy as different views arise from the same technical information, making it important to consider varied perspectives.
  2. Chinese researchers have developed a method called SOPHON to openly release AI models while preventing finetuning for misuse, offering a solution for protecting against subsequent harm.
  3. Automating intelligence analysis through datasets like OpenStreetView-5M will enhance training machine learning systems for geolocation, leading to potential applications in both military intelligence and civilian sectors.
Neeloy’s Substack 119 implied HN points 24 Jul 24
  1. Many International Math Olympiad gold medalists end up pursuing careers in different fields, not just in finance or academia. It's interesting to see how their paths vary after such early success.
  2. Data collection on these medalists shows a clear trend where China dominates in terms of gold medals, with a majority of their students achieving this top honor. This highlights the competitive environment in math education in that country.
  3. The dataset used to track these medalists has its limitations, particularly due to language and cultural barriers in finding information. However, the findings still provide valuable insights into the outcomes of these talented individuals.
The Counterfactual 199 implied HN points 27 Jun 24
  1. Always look at the whole distribution of data, not just the average. The average can be affected by extreme values, so it's crucial to see the bigger picture to understand what the data really tells us.
  2. Consider the baseline or reference point when evaluating numbers. Knowing how a number compares to others helps us understand if it's large or small, which gives us better context.
  3. Understand the story behind the data-generating process. This means recognizing the factors that led to the results we see, which helps in identifying possible biases or alternative explanations.
Nail It and Scale It 119 implied HN points 22 Jul 24
  1. Many online advertising benchmarks are unreliable because they don't account for differences in pricing and offers. This means you might be comparing apples to oranges, leading to wrong conclusions.
  2. To get better benchmarks, focus on two key metrics: Cost-Per-Click (CPC) and Conversion Rate. These give you a clearer picture of how your ads are performing compared to others.
  3. Joining groups or talking to industry experts can help you find more accurate conversion rates for your products. Sharing data with peers is a good way to understand what's normal in your field.
Richard Hanania's Newsletter 3657 implied HN points 07 Oct 24
  1. Many people incorrectly believe that immigration leads to higher crime rates. In reality, data shows that most immigrants, especially legal ones, tend to commit less crime than native-born citizens.
  2. Some politicians use scary language about immigrants increasing crime to push their agenda. This can create a false narrative that makes the public fearful and misinformed about the actual impact of immigration.
  3. Immigrants often face more crime themselves and can actually help reduce crime rates in communities by starting businesses and contributing to the economy. So, they can serve as a buffer against crime rather than a cause of it.
Musings on Markets 1099 implied HN points 05 Jan 24
  1. All companies are included in data analysis to get a full picture, not just big ones. This helps avoid bias and shows a more accurate view of industries.
  2. The data covers many financial variables that help understand company decisions about investment, financing, and dividends. It also uses unique ways to calculate statistics for more accurate insights.
  3. The statistics are updated regularly to reflect the latest available information. Users should utilize the data wisely and be aware of any changes in accounting standards or currency issues.
Day One 758 implied HN points 24 Feb 24
  1. Building trust and authority through valuable content is essential for selling products or services online
  2. Utilizing testimonials and free high-quality content can greatly persuade potential customers to make a purchase
  3. Addressing objections, providing ongoing support, and reducing buyer's remorse are key to maintaining customer satisfaction and loyalty
Weight and Healthcare 818 implied HN points 10 Feb 24
  1. The study on Tirzepatide showed that weight loss for participants slowed after 36 weeks, with those switching to placebo experiencing weight regain while those continuing the drug had a slight weight reduction in the following 52 weeks.
  2. Side effects of Tirzepatide included gastrointestinal issues like nausea, diarrhea, constipation, and vomiting. Close to 82% of participants reported experiencing at least one adverse event during the treatment period.
  3. The study's findings indicate that a significant percentage of participants taking Tirzepatide did not meet the weight reduction thresholds, with a lack of diverse representation among participants and a lack of a weight-neutral comparator group presenting issues in the study design.
Nepetalactone Newsletter 1670 implied HN points 30 Apr 23
  1. There are two types of scientists: those who worship hierarchy and those who understand hierarchy is a cancer to the scientific method.
  2. The EMA found several objections to Pfizer's data, showing that it did not meet GMP standards.
  3. Concerns were raised by the EMA about Pfizer's data integrity, lack of biological characterization, and inconsistencies in the data provided.
Cremieux Recueil 803 implied HN points 22 Jul 25
  1. Statistical controls aren’t a magic solution; using them incorrectly can lead to wrong conclusions. It's important to understand the underlying relationships between variables before just plugging numbers into an equation.
  2. Matching groups in studies to control for variables often isn't enough. You might still end up with biases if the controls aren’t comprehensive or well-measured.
  3. Over-controlling or trying to account for too many factors can confuse the results. Sometimes, less control can provide a clearer picture, just like how comparing fast food and fine dining should keep their unique qualities intact.
Points And Figures 746 implied HN points 31 Jul 25
  1. The Fed is politically influenced, as seen in their recent decision to keep interest rates unchanged, despite some members wanting to lower them.
  2. Recent PCE data indicates inflation is rising, which might justify keeping rates steady even in light of other pressures for cuts.
  3. Changes in tariffs are likened to taxes that can slow down the economy, and the current money supply suggests potential recession signs, complicating the decision on whether to ease rates.
The Uncertainty Mindset (soon to become tbd) 99 implied HN points 24 Jul 24
  1. AI systems look like they can think independently, but they really can't. They are tools that need humans to make decisions about value.
  2. Meaning-making is a core human skill that AI lacks. Only humans can decide what actions are meaningful and worthwhile.
  3. When we treat AI as if it can make important decisions, we risk misusing it. It's crucial to keep humans involved in the decision-making process.
The AI Frontier 79 implied HN points 01 Aug 24
  1. Vibes-based evaluations are a helpful starting point for assessing AI quality, especially when specific metrics are hard to define. They allow for initial impressions based on user interactions rather than strict guidelines.
  2. Customers often have unique and unexpected requests that can't easily fit into predefined test sets. Vibes allow for flexibility in understanding real-world usage.
  3. While vibes are useful, they also have downsides, like strong first impressions and limited feedback. A mix of vibes and structured evaluations can provide a better overall understanding of an AI's performance.
Resilient Cyber 79 implied HN points 01 Aug 24
  1. The Exploit Prediction Scoring System (EPSS) helps predict how likely a software vulnerability is to be exploited. It provides a score, so organizations can focus on the vulnerabilities that really matter.
  2. Most vulnerabilities that are reported, about 94%, aren’t even exploited in real life. This means organizations waste a lot of resources on vulnerabilities that pose no threat, highlighting the importance of focusing on the ones that are actually exploited.
  3. The EPSS tool works better than older systems like the Common Vulnerability Scoring System (CVSS). It helps organizations prioritize their efforts because it brings more efficiency in vulnerability management.
SemiAnalysis 7576 implied HN points 27 Sep 23
  1. Eroom's Law and Moore's Law are critical in Semiconductors and Drug Research, analyzing time, money, and output.
  2. Healthcare, a $4 trillion industry, lags behind in technological progress driven by Moore's Law.
  3. Illumina acquisition by Nvidia could bridge the gap in genomics, addressing bottlenecks and enabling full-stack healthcare solutions.
Richard Hanania's Newsletter 853 implied HN points 30 Jun 25
  1. Data collection in Sub-Saharan Africa is very poor, leading to unreliable statistics about important issues like GDP and murder rates. This makes it hard to understand the region's actual progress.
  2. Young men are not resonating with the Democratic Party because they tend to be healthier and less victim-oriented than the party's current messaging. This might shift how they are seen in political discussions.
  3. A recent article highlights that free trade may have stronger ties to the left than the right, suggesting the motivations behind protectionism can often be less than noble.
Jampa’s Substack 40 HN points 21 Aug 24
  1. Finding a place to live in a small, low-tech city can be really challenging. There aren't many real estate options or online listings, so one might need to explore the area by driving around.
  2. Using technology like OpenStreetMaps and AI can help in identifying neighborhoods and evaluating their quality. This can save a lot of time compared to traditional methods.
  3. It's important to check the neighborhood in person, even after using tech tools. Seeing the area first-hand can give a better understanding of what to expect and help find suitable homes.
Alberto Cairo's The Art of Insight 139 implied HN points 01 Jul 24
  1. Visualization can be a powerful tool for learning. When you create visuals with clear learning objectives, it helps the viewer understand and remember the message better.
  2. In legal settings, visuals can be persuasive. They help juries see the facts in a more impactful way, making it easier to follow along and draw conclusions.
  3. Creating visuals is a shared experience. When designers and their audience connect over a visualization, it can lead to moments of discovery and understanding together.
Scott's Substack 805 implied HN points 11 Jan 24
  1. Scientific manuscripts should strike a balance between readability and accuracy.
  2. Clearly state the specific target parameters in research questions to drive methods used.
  3. Different methods in scientific research identify different treatment effects; state target parameters upfront to guide method selection.
LatchBio 33 implied HN points 06 Feb 26
  1. scBench is a realistic benchmark of 394 verifiable single-cell RNA‑seq problems spanning six sequencing platforms and seven task types, using real data snapshots and deterministic graders to mimic the decisions bioinformaticians make.
  2. Frontier models do better on scRNA‑seq than on spatial data but are still unreliable overall: the best model scores about 52.8% and tasks requiring scientific judgment (cell typing, clustering, differential expression) are the hardest while procedural steps (normalization, QC) are easiest.
  3. Which sequencing platform the data come from matters as much or more than model choice—platforms drive large accuracy swings—so trustworthy automation will require platform‑aware tooling, better harness design, and more representative training data.
An Insult to Intuition 1277 implied HN points 22 May 23
  1. An effort to educate Massachusetts State Reps about proposed bills protecting individual rights faced challenges with low attendance from legislators.
  2. The presentation highlighted concerns about the safety and efficacy of mRNA vaccines, questioning the data and potential negative outcomes.
  3. Issues were raised about biased reporting by a news service, labeling presenters as 'vaccine skeptics' and not fully representing their evidence-based arguments.
SeattleDataGuy’s Newsletter 506 implied HN points 08 Aug 25
  1. Self-service analytics hasn't delivered as promised. Companies still struggle to find basic answers and often just switch tools instead of addressing the real issues.
  2. Dashboard fatigue is a real problem. Many dashboards go unused because they are complicated and not user-friendly, making executives reluctant to engage with them.
  3. AI is not a cure-all for self-service problems. Data needs careful preparation and clear questions from users to be effective, and many still rely heavily on traditional methods like spreadsheets.
The Data Ecosystem 159 implied HN points 16 Jun 24
  1. The data lifecycle includes all the steps from when data is created until it is no longer needed. This helps organizations understand how to manage and use their data effectively.
  2. Different people and companies might describe the data lifecycle in slightly different ways, which can be confusing. It's important to have a clear understanding of what each term means in context.
  3. Properly managing data involves stages like storage, analysis, and even disposal or archiving. This ensures data remains useful and complies with regulations.
SeattleDataGuy’s Newsletter 541 implied HN points 17 Jul 25
  1. Before creating a dashboard, ask what decisions it will help make. It’s important that the data leads to real actions, not just interesting numbers.
  2. Clarify what success looks like for the stakeholder. Knowing their goals can help you make a dashboard that meets their needs instead of guessing.
  3. After delivering a dashboard, follow up with users to ensure they understand and are using it. This helps prevent wasted effort and keeps the dashboard relevant.
Democratizing Automation 562 implied HN points 12 Jul 25
  1. Grok 4 is a powerful AI model that performs well on benchmarks but struggles in practical usability, making it hard for users to switch from existing AI tools.
  2. The model's unique selling point is its ability to use multiple agents for complex tasks, but its overall performance can be inconsistent and relies heavily on search functions.
  3. Despite achieving high scores, Grok 4 faces significant challenges, including a lack of differentiation in a crowded market, where simply being better isn't enough to attract users.