The hottest Data Analysis Substack posts right now

And their main takeaways

Do Masks Work?

Uncharted Territories • 5149 implied HN points • 28 Feb 23

🏥 Health & Wellness Public Health Research Masks Infections Data Analysis

The debate around mask efficacy is contentious and the science is complex.
Properly worn masks can reduce infection rates, especially when used in community settings.
Some studies in the meta-analysis may have been weighted inaccurately, resulting in misleading conclusions.

🤘ACDC (not that one)

Gonzo ML • 63 implied HN points • 29 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Neural Networks Data Analysis Automation

The paper introduces a method called ACDC that automates the process of finding important circuits in neural networks. This can help us better understand how these networks work.
Researchers follow a three-step workflow to study model behavior, and ACDC fully automates the last step which helps identify connections that matter for a specific task.
While ACDC shows promise, it isn't perfect. It may miss some important connections and needs adjustments for different tasks to improve its accuracy.

Public Universal Friend • 79 implied HN points • 02 Sep 24

💼 Business Marketing Customer Engagement Data Analysis Growth Strategies

Using a customer engagement platform like Customer.io can help marketers improve their targeting and maximize growth. It offers better data management and less need for technical support.
Spring is a great time for businesses to focus on improving conversions through digital marketing strategies. Real-time data can help companies get more return on their investment.
Personal connections and genuine interactions are valuable, even in business communication. Taking the time to show real interest can make a difference.

Why Data Teams Need to Understand Metrics: A Look at Starbucks’ Comparable Store Sales

SeattleDataGuy’s Newsletter • 447 implied HN points • 08 Nov 24

💼 Business Metrics Data Analysis Corporate strategy Revenue management

Data teams need to know the main numbers that matter for their business. This helps them understand how the company is performing.
High-level metrics like revenue and expenses can seem too big to grasp. Breaking these down into smaller parts makes them easier to understand.
These smaller, detailed metrics can reveal valuable insights that affect decisions and strategies for the business.

Meta seeks to hide harms from teens

Platformer • 2476 implied HN points • 10 Jan 24

🕹 Technology Social media Artificial Intelligence Regulation Data Analysis Digital Advertising

Meta announced new measures to protect users under 18 from harmful content on its platforms.
There is a growing focus on child safety in social media regulations, shifting from speech-related issues.
Lawmakers and social networks need to find common ground to make real progress in improving teen mental health.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Independent SAGE has joined substack!

Independent SAGE continues • 1418 implied HN points • 20 Mar 24

🏥 Health Politics Public Health Pandemic response Vaccination Data Analysis

Independent SAGE has launched a Substack to share insights about Covid research and data. They aim to provide valuable information directly from experts to the public.
They plan to post updates roughly every two weeks, including responses to important new research and news. This helps keep everyone informed about the ongoing situation.
The Substack will remain free for subscribers, encouraging more people to stay updated on Covid developments and public health measures.

Measuring developer productivity: A clear-eyed view

Engineering Enablement • 21 implied HN points • 05 Feb 25

🕹 Technology Software Development Developer Experience Productivity Metrics Engineering Management Data Analysis

Metrics for developers should help improve their work experience, not just measure their output. Goodhart's Law reminds us that once metrics are tied to rewards, they can become misleading.
Developer experience is more about effectiveness than happiness. Measuring how developers feel needs to focus on the frustrations they face, and not just on making them comfortable.
Using benchmarks is important but context is key. Just like medical tests, numbers need interpretation to make sense; comparing different teams requires understanding their unique challenges.

Polygenic Risk Scores: Ready for Prime Time?

Ground Truths • 3980 implied HN points • 19 Feb 24

🏥 Health & Wellness Genetics Risk Assessment Prevention Data Analysis

Polygenic risk scores can provide valuable information on high genetic risk for diseases like heart disease and cancer, beyond traditional clinical risk factors.
The use of polygenic risk scores is advancing thanks to efforts like the eMERGE consortium, incorporating multi-ancestry data and rigorous validation.
Actionable polygenic risk scores have the potential to reduce health disparities and enhance preventive strategies in medical practice.

AI Hallucinations on the Decline

Jakob Nielsen on UX • 21 implied HN points • 13 Feb 25

🕹 Technology AI Usability UX Design Data Analysis

AI models are getting better at reducing false information, called hallucinations. This means they are less likely to make things up over time.
Bigger AI models generally make fewer mistakes. As AI technology improves, we can expect even fewer errors from future models.
While waiting for better AI, improving user experience can help users spot and double-check misleading information, making it easier to trust AI outputs.

The cult of data has gone too far

Wednesday Wisdom • 113 implied HN points • 01 Jan 25

🕹 Technology Data Analysis Decision-making Corporate culture Consulting Tech industry

Relying too much on data can lead to wrong decisions because numbers don't always tell the full story. Sometimes, human judgment or understanding is needed.
Data can create a false sense of certainty, making people ignore the uncertainties and assumptions behind those numbers. It's important to be honest about what the data truly represents.
Setting goals based on numbers can make teams lose sight of the real-world processes they are supposed to improve. Chasing metrics blindly can lead to poor outcomes.

How to do linear regression and correlation analysis

Lenny's Newsletter • 3144 implied HN points • 02 May 23

🚌 Education Data Analysis Statistical Analysis Product Analytics

Correlation analysis shows how closely two variables are connected.
Linear regression goes further by showing how much one variable affects another and helps predict behavior.
Use product analytics tools for faster confirmation of relationships between metrics and user activity.

Finding fake followers (Bluesky edition)

Conspirador Norteño • 68 implied HN points • 18 Jan 25

🕹 Technology Social media Internet Data Analysis Spam Detection Artificial Intelligence

You can spot fake followers on Bluesky by looking for accounts with similar join dates and generic profiles. These accounts often have no posts and repetitive bios.
Using a method where you track the followers of suspected fake accounts can help identify whole networks of fake followers. By downloading and filtering their followers, you can map out these networks.
The Bluesky platform has a real-time feature called the firehose that makes it easier to catch fake follower activity as it happens. However, this can give some false positives, so users need to be careful.

Nvidia Illumina Acquisition? The AI Foundry for Healthcare – The Hardware of Life

SemiAnalysis • 7576 implied HN points • 27 Sep 23

🕹 Technology Healthcare Genomics AI Compute Data Analysis

Eroom's Law and Moore's Law are critical in Semiconductors and Drug Research, analyzing time, money, and output.
Healthcare, a $4 trillion industry, lags behind in technological progress driven by Moore's Law.
Illumina acquisition by Nvidia could bridge the gap in genomics, addressing bottlenecks and enabling full-stack healthcare solutions.

Google Trends API alternative: Wikipedia Pageview Statistics

Franz likes to code • 39 implied HN points • 05 Sep 24

🕹 Technology Programming APIs Data Analysis Python Wikipedia

If you're having trouble with the Google Trends Python package, you can switch to using Wikipedia's page view statistics instead. It's a reliable and official way to get data on search trends.
Wikipedia provides a rich API that allows you to fetch daily or hourly view counts for specific articles. This can help analyze how topics gain interest over time.
You can use a simple Python code to find the page views for any Wikipedia article, making it easy to replace Google Trends in your research and get the data you need.

Diving deep into OpenAI’s new study on LLM’s and bioweapons

Marcus on AI • 2845 implied HN points • 04 Feb 24

🕹 Technology AI Data Analysis Risk Assessment Ethics Research

OpenAI's study on GPT-4 and bioweapons raises concerns due to potential risks
The statistical analysis in the study may have downplayed the actual risk indicated
There is a need for follow-up research and accurate interpretation of AI-related safety-critical studies

How One UX Researcher Ignited Sweeping Changes to YouTube

Elizabeth Laraki • 419 implied HN points • 28 May 24

🕹 Technology User Experience Data Analysis Product Design Digital Media Research Methods

Kerry Rodden, a UX researcher, helped YouTube understand how users navigated the site. By deeply analyzing user data, they found out what people really wanted from YouTube.
One big surprise was that most YouTube sessions didn't start on the homepage. Instead, many users went directly to watch videos they found elsewhere on the internet.
Kerry created clear visualizations of user data that showed how people moved through YouTube. This helped the company improve its homepage and focus on personalizing content for users.

New study out: Systemic Racism Does Not Explain Variation in Race Gaps on Cognitive Tests

Just Emil Kirkegaard Things • 1513 implied HN points • 06 Jan 24

🔬 Science Racism Intelligence Data Analysis Social Disparities Meritocracy

Systemic racism theory predictions were not supported by the study results.
US counties with more Republicans had smaller racial gaps in cognitive tests.
Racial achievement gaps were smaller in counties with higher White population shares.

Economics Links, 1/25/2025

In My Tribe • 410 implied HN points • 25 Jan 25

💰 Finance Economics Investing Market Trends Data Analysis Political Economy

Many experts believe that relying on government decisions can be inefficient because it often favors those with political power instead of addressing real needs.
Inequality is a natural part of society, and efforts to eliminate it through government action can lead to problems, including promoting wokeness.
Economic data can often be misleading due to measurement errors, making it hard to trust figures that inform important decisions like GDP or monetary policies.

8 Forecasts & Implications for the Years Ahead :: 2024+

Implications, by Scott Belsky • 1356 implied HN points • 04 Jan 24

🕹 Technology AI Education Entertainment Organizational Design Data Analysis

The future will be personalized to your preferences, with digital experiences tailored to you.
Local OS-native AI models will improve everyday life and redefine consumer AI, focusing on personalization, trust, and privacy.
Small brands will become more competitive with big brands, AI will influence purchase decisions, and education will undergo a significant transformation.

Avoiding Epistemological Nihilism

Richard Hanania's Newsletter • 3657 implied HN points • 12 Feb 24

🔬 Science Research Data Analysis

Social scientists often resort to statistical relationships when randomized experiments are not feasible, which can lead to flawed conclusions due to selection effects and confounding variables.
Flawed data is often worse than having no data at all, as it can mislead individuals into making decisions based on inaccurate information.
To form reasonable opinions on social, political, and economic issues, it is essential to prioritize well-grounded ideas backed by theoretical reasoning and empirical data over blindly following data from flawed social science research.

Import AI 371: CCP vs Finetuning; why people are skeptical of AI policy; a synthesizer for a LLM

Import AI • 439 implied HN points • 06 May 24

🕹 Technology AI Research Data Analysis Medical AI Image Generation Internet culture

People are skeptical of AI safety policy as different views arise from the same technical information, making it important to consider varied perspectives.
Chinese researchers have developed a method called SOPHON to openly release AI models while preventing finetuning for misuse, offering a solution for protecting against subsequent harm.
Automating intelligence analysis through datasets like OpenStreetView-5M will enhance training machine learning systems for geolocation, leading to potential applications in both military intelligence and civilian sectors.

Measuring Lines/Function

Software Design: Tidy First? • 132 implied HN points • 05 Dec 24

🕹 Technology Software Design Programming Data Analysis Development Practices

Measuring lines of code in functions can be more complicated than expected. It's helpful to keep track of this while working on software projects.
Looking for patterns in software, like Pareto distributions, can provide valuable insights. It's good practice to analyze your own code for these patterns.
Documenting your findings is important. Sharing your experiences can help others who are trying to understand their software better.

Where have the International Math Olympiad Gold Medallists Ended Up? Part One of Three

Neeloy’s Substack • 119 implied HN points • 24 Jul 24

🚌 Education Mathematics Competitions Career Paths Data Analysis

Many International Math Olympiad gold medalists end up pursuing careers in different fields, not just in finance or academia. It's interesting to see how their paths vary after such early success.
Data collection on these medalists shows a clear trend where China dominates in terms of gold medals, with a majority of their students achieving this top honor. This highlights the competitive environment in math education in that country.
The dataset used to track these medalists has its limitations, particularly due to language and cultural barriers in finding information. However, the findings still provide valuable insights into the outcomes of these talented individuals.

How to evaluate statistical claims

The Counterfactual • 199 implied HN points • 27 Jun 24

🚌 Education Statistics Research Methods Data Analysis Critical Thinking

Always look at the whole distribution of data, not just the average. The average can be affected by extreme values, so it's crucial to see the bigger picture to understand what the data really tells us.
Consider the baseline or reference point when evaluating numbers. Knowing how a number compares to others helps us understand if it's large or small, which gives us better context.
Understand the story behind the data-generating process. This means recognizing the factors that led to the results we see, which helps in identifying possible biases or alternative explanations.

DeepSeek: Does a Small AI Model Invalidate Big Models?

Jakob Nielsen on UX • 27 implied HN points • 30 Jan 25

🕹 Technology AI Models Machine Learning Computing Data Analysis Investments

DeepSeek's AI model is cheaper and uses a lot less computing power than other big models, but it still performs well. This shows smaller models can be very competitive.
Investments in AI are expected to keep growing, even with cheaper models available. Companies will still spend billions to advance AI technology and achieve superintelligence.
As AI gets cheaper, more people will use it and businesses will likely spend more on AI services. The demand for AI will increase as it becomes more accessible.

Why Most Online Advertising Benchmarks Are BS (And What To Do Instead)

Nail It and Scale It • 119 implied HN points • 22 Jul 24

💼 Business Marketing Advertising Data Analysis E-commerce Growth Strategies

Many online advertising benchmarks are unreliable because they don't account for differences in pricing and offers. This means you might be comparing apples to oranges, leading to wrong conclusions.
To get better benchmarks, focus on two key metrics: Cost-Per-Click (CPC) and Conversion Rate. These give you a clearer picture of how your ads are performing compared to others.
Joining groups or talking to industry experts can help you find more accurate conversion rates for your products. Sharing data with peers is a good way to understand what's normal in your field.

Data Update 1 for 2024: The Data Speaks, but what is it saying?

Musings on Markets • 1099 implied HN points • 05 Jan 24

💰 Finance Investing Valuation Data Analysis Corporate Finance Market Trends

All companies are included in data analysis to get a full picture, not just big ones. This helps avoid bias and shows a more accurate view of industries.
The data covers many financial variables that help understand company decisions about investment, financing, and dividends. It also uses unique ways to calculate statistics for more accurate insights.
The statistics are updated regularly to reflect the latest available information. Users should utilize the data wisely and be aware of any changes in accounting standards or currency issues.

Lessons From Over 13 Million Naira Worth of Organic WhatsApp Sales

Day One • 758 implied HN points • 24 Feb 24

💼 Business Sales Marketing Content creation Data Analysis Customer Support

Building trust and authority through valuable content is essential for selling products or services online
Utilizing testimonials and free high-quality content can greatly persuade potential customers to make a purchase
Addressing objections, providing ongoing support, and reducing buyer's remorse are key to maintaining customer satisfaction and loyalty

Zepbound/Mounjaro Tirzepatide for Weight Loss Part 3

Weight and Healthcare • 818 implied HN points • 10 Feb 24

🏥 Health & Wellness Weight Loss Data Analysis Side Effects Research Ethics

The study on Tirzepatide showed that weight loss for participants slowed after 36 weeks, with those switching to placebo experiencing weight regain while those continuing the drug had a slight weight reduction in the following 52 weeks.
Side effects of Tirzepatide included gastrointestinal issues like nausea, diarrhea, constipation, and vomiting. Close to 82% of participants reported experiencing at least one adverse event during the treatment period.
The study's findings indicate that a significant percentage of participants taking Tirzepatide did not meet the weight reduction thresholds, with a lack of diverse representation among participants and a lack of a weight-neutral comparator group presenting issues in the study design.

But it's GMP!

Nepetalactone Newsletter • 1670 implied HN points • 30 Apr 23

🏥 Health & Wellness Science Vaccines Controversy Safety Data Analysis

There are two types of scientists: those who worship hierarchy and those who understand hierarchy is a cancer to the scientific method.
The EMA found several objections to Pfizer's data, showing that it did not meet GMP standards.
Concerns were raised by the EMA about Pfizer's data integrity, lack of biological characterization, and inconsistencies in the data provided.

Spam in the firehose

Conspirador Norteño • 128 implied HN points • 06 Dec 24

🕹 Technology Social media Spam Detection Data Analysis Online Security Programming

Monitoring the Bluesky firehose can help quickly spot fake accounts. By looking for repeated names and profiles, it's easier to identify spam activity.
A large number of spam accounts often share similar biographies. One group had over a thousand accounts with variations of the same few phrases.
Many spam accounts use stolen images as profile pictures. This makes them look less authentic and easier to identify as spam.

Pfizer's own data implicates their contamination

Nepetalactone Newsletter • 1650 implied HN points • 28 Apr 23

🔬 Science Research Medical Data Analysis Bioinformatics Sequencing

Pfizer's own data suggests contamination in their vaccines.
Pfizer did not provide raw sequence data for their RNA-Sequencing data.
Pfizer's RNA-Sequencing picked up plasmid contamination, indicating an issue.

AI's seductive mirage

The Uncertainty Mindset (soon to become tbd) • 99 implied HN points • 24 Jul 24

🕹 Technology AI Automation Data Analysis Human-AI Interaction Product Development Ethics

AI systems look like they can think independently, but they really can't. They are tools that need humans to make decisions about value.
Meaning-making is a core human skill that AI lacks. Only humans can decide what actions are meaningful and worthwhile.
When we treat AI as if it can make important decisions, we risk misusing it. It's crucial to keep humans involved in the decision-making process.

Whose responsibility is it anyway?

Just Emil Kirkegaard Things • 825 implied HN points • 30 Jan 24

🔬 Science Psychometrics Intelligence Political Science Psychology Data Analysis

Intelligence correlates with different ideological and political beliefs.
More intelligence predicts more libertarian views about the government's role.
Smart people are more likely to be nihilists.

In defense of vibes-based evaluations

The AI Frontier • 79 implied HN points • 01 Aug 24

🕹 Technology AI Research Product Development Evaluation Metrics User Experience Data Analysis

Vibes-based evaluations are a helpful starting point for assessing AI quality, especially when specific metrics are hard to define. They allow for initial impressions based on user interactions rather than strict guidelines.
Customers often have unique and unexpected requests that can't easily fit into predefined test sets. Vibes allow for flexibility in understanding real-world usage.
While vibes are useful, they also have downsides, like strong first impressions and limited feedback. A mix of vibes and structured evaluations can provide a better overall understanding of an AI's performance.

Vulnerability Exploitation in the Wild

Resilient Cyber • 79 implied HN points • 01 Aug 24

🕹 Technology Cybersecurity Data Analysis Software Development Vulnerability Management Risk Assessment

The Exploit Prediction Scoring System (EPSS) helps predict how likely a software vulnerability is to be exploited. It provides a score, so organizations can focus on the vulnerabilities that really matter.
Most vulnerabilities that are reported, about 94%, aren’t even exploited in real life. This means organizations waste a lot of resources on vulnerabilities that pose no threat, highlighting the importance of focusing on the ones that are actually exploited.
The EPSS tool works better than older systems like the Common Vulnerability Scoring System (CVSS). It helps organizations prioritize their efforts because it brings more efficiency in vulnerability management.

Updating HarvestIQ

The Security Industry • 10 implied HN points • 03 Feb 25

🕹 Technology Cybersecurity AI Tools Product Development Software Data Analysis

HarvestIQ now combines two assistants into one, simplifying interactions for users. This helps reduce confusion and makes it easier to get information about cybersecurity vendors and products.
Users can ask the Cyber Assistant for various tasks like product comparisons, SWOT analyses, and customized news summaries. These features aim to enhance decision-making in cybersecurity.
The IT-Harvest Dashboard and HarvestIQ serve different purposes. The Dashboard is great for exploring detailed data, while HarvestIQ is more about getting direct answers and insights.

Wardriving for a place to live

Jampa’s Substack • 40 HN points • 21 Aug 24

🕹 Technology AI Tools Data Analysis Real Estate Software Development Urban planning

Finding a place to live in a small, low-tech city can be really challenging. There aren't many real estate options or online listings, so one might need to explore the area by driving around.
Using technology like OpenStreetMaps and AI can help in identifying neighborhoods and evaluating their quality. This can save a lot of time compared to traditional methods.
It's important to check the neighborhood in person, even after using tech tools. Seeing the area first-hand can give a better understanding of what to expect and help find suitable homes.

Mixtape Mailbag #7: What Happens in Difference-in-Differences if Parallel Trends is satisfied but No Anticipation is Violated?

Scott's Substack • 786 implied HN points • 22 Jan 24

🚌 Education Causal Inference Data Analysis Subscribers

In Difference-in-Differences analysis, parallel trends being satisfied is important.
Understanding and considering the assumption of no anticipation is crucial in the analysis.
Losing the assumption of no anticipation can lead to biases in the results.

The hottest Data Analysis Substack posts right now

Uncharted Territories • 5149 implied HN points • 28 Feb 23

Gonzo ML • 63 implied HN points • 29 Jan 25

Public Universal Friend • 79 implied HN points • 02 Sep 24

SeattleDataGuy’s Newsletter • 447 implied HN points • 08 Nov 24

Platformer • 2476 implied HN points • 10 Jan 24

Independent SAGE continues • 1418 implied HN points • 20 Mar 24

Engineering Enablement • 21 implied HN points • 05 Feb 25

Ground Truths • 3980 implied HN points • 19 Feb 24

Jakob Nielsen on UX • 21 implied HN points • 13 Feb 25

Wednesday Wisdom • 113 implied HN points • 01 Jan 25

Lenny's Newsletter • 3144 implied HN points • 02 May 23

Conspirador Norteño • 68 implied HN points • 18 Jan 25

Chartbook • 400 implied HN points • 21 Oct 24

SemiAnalysis • 7576 implied HN points • 27 Sep 23

Franz likes to code • 39 implied HN points • 05 Sep 24

Marcus on AI • 2845 implied HN points • 04 Feb 24

Elizabeth Laraki • 419 implied HN points • 28 May 24

Just Emil Kirkegaard Things • 1513 implied HN points • 06 Jan 24

In My Tribe • 410 implied HN points • 25 Jan 25

Implications, by Scott Belsky • 1356 implied HN points • 04 Jan 24

Richard Hanania's Newsletter • 3657 implied HN points • 12 Feb 24

Import AI • 439 implied HN points • 06 May 24

Software Design: Tidy First? • 132 implied HN points • 05 Dec 24

Neeloy’s Substack • 119 implied HN points • 24 Jul 24

The Counterfactual • 199 implied HN points • 27 Jun 24

Jakob Nielsen on UX • 27 implied HN points • 30 Jan 25

Nail It and Scale It • 119 implied HN points • 22 Jul 24

Musings on Markets • 1099 implied HN points • 05 Jan 24

Day One • 758 implied HN points • 24 Feb 24

Weight and Healthcare • 818 implied HN points • 10 Feb 24

Nepetalactone Newsletter • 1670 implied HN points • 30 Apr 23

Conspirador Norteño • 128 implied HN points • 06 Dec 24

Nepetalactone Newsletter • 1650 implied HN points • 28 Apr 23

The Uncertainty Mindset (soon to become tbd) • 99 implied HN points • 24 Jul 24

Just Emil Kirkegaard Things • 825 implied HN points • 30 Jan 24

The AI Frontier • 79 implied HN points • 01 Aug 24

Resilient Cyber • 79 implied HN points • 01 Aug 24

The Security Industry • 10 implied HN points • 03 Feb 25

Jampa’s Substack • 40 HN points • 21 Aug 24

Scott's Substack • 786 implied HN points • 22 Jan 24