The hottest Statistical Analysis Substack posts right now

And their main takeaways

Twin heritability models can tell you whatever you want to hear

The Infinitesimal • 719 implied HN points • 09 Aug 24

Twin heritability models can produce different estimates of how much traits are influenced by genetics versus environment. This can lead to confusion about what is truly inherited and what is shaped by upbringing.
Cultural factors along with genetic factors play a significant role in shaping traits. Sometimes, what seems genetic can actually be environmental influences like parenting styles, which complicate our understanding of inheritance.
Recent studies suggest that assumptions made in traditional twin studies might not be entirely accurate. By including more family relationships and considering cultural impacts, researchers can get a clearer picture of what really contributes to traits.

"You Couldn't Replicate Our Study Because You're Ugly"

Cremieux Recueil • 416 implied HN points • 03 Dec 24

🔬 Science Research Methods Social Psychology Statistical Analysis Methodology

Attractiveness studies may not be very reliable because their methods can be flawed. It's important to be careful about how these studies are designed and what they claim.
Different studies use different ways to measure attractiveness, which can lead to confusion and mismatched results. It's not always clear which findings are valid.
Racial preference in dating apps can be hard to measure correctly. Good research design is key, and many studies may not handle these issues well, leading to uncertain conclusions.

How to do linear regression and correlation analysis

Lenny's Newsletter • 3144 implied HN points • 02 May 23

🚌 Education Data Analysis Statistical Analysis Product Analytics

Correlation analysis shows how closely two variables are connected.
Linear regression goes further by showing how much one variable affects another and helps predict behavior.
Use product analytics tools for faster confirmation of relationships between metrics and user activity.

RERF experts claim their bomb survivor solid cancer mortality data is non-linear

Gordian Knot News • 87 implied HN points • 08 Jan 25

🔬 Science Cancer Research Health Effects Statistical Analysis Public Health

RERF experts found that solid cancer mortality data from bomb survivors shows a non-linear pattern. This means that higher radiation doses lead to differing effects on cancer rates than previously thought.
They noticed an upward curve in cancer risk among both men and women, but the effect was more significant for women. This is important to understand how radiation impacts different sexes.
The researchers also highlighted a 'High Dose Effect' where fewer cancers seem to occur at very high radiation doses. This challenges some existing theories about radiation and cancer risk.

The vast deserts of useless science

Wyclif's Dust • 804 implied HN points • 19 Oct 24

🔬 Science Statistical Analysis

Correlation does not mean causation, yet many scientists treat it as if it does. This can lead to misleading conclusions and a lack of real progress in research.
Many fields, like veterinary science, show a lot of poorly conducted studies that don't really prove anything. This is concerning because it affects how animals are treated, with not enough good evidence to support common practices.
The scientific community needs to hold itself accountable and produce reliable research. Right now, there isn't enough incentive for some researchers to conduct proper studies, leading to a lot of flawed findings.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

An Appreciation of Joe Stiglitz

Something to Consider • 139 implied HN points • 03 Jul 24

💼 Business Economics Market Theory Statistical Analysis Public Policy

Markets work best when everyone has the same information, but that's rarely the case in reality. Stiglitz shows us how imperfect information affects economic decisions.
Share-cropping has its own risks and benefits. It allows landlords to provide safety nets for tenants, but it can also limit tenants' work incentives.
When companies pay higher wages, they can improve worker effort and reduce turnover. This is known as the efficiency wage theory, which explains why some businesses might choose to hire fewer employees at higher salaries.

Bridging the Gap: From Statistical Distributions to Machine Learning Loss Functions

Mindful Modeler • 818 implied HN points • 14 Nov 23

🕹 Technology Machine Learning Statistical Analysis Optimization

Understanding the distribution of the target variable is key in choosing statistical analysis or machine learning loss functions.
Certain loss functions in machine learning correspond to maximum likelihood estimation for specific distributions, creating a bridge between statistical modeling and machine learning.
While connecting distributions to loss functions is insightful, the real power in machine learning lies in the flexibility to design custom loss functions rather than being constrained by specific distributions.

We Should Try to Directly Measure the Value of Scientific Papers

Nonsense on Stilts • 59 implied HN points • 20 Jul 24

🔬 Science Research Methods Information Theory Statistical Analysis Scientific Integrity Decision Theory

We should measure the value of scientific papers to understand their real impact. If a paper doesn't change how people act or think, then it may not be worth much.
To figure out the value of a paper, we can use a formula that compares what outcomes we expect with the information from the paper versus without it. This helps us see if the research is actually useful.
It's important to have good estimates and decisions tied to the research to see its true worth. By doing this, we can better judge which scientific papers are really making a difference.

Don’t Trust, Verify ... Your Visual Stats

FILWD • 294 implied HN points • 16 Jan 24

🔬 Science Data Visualization Statistical Analysis Normalization

Develop intellectual curiosity and explore different solutions in data transformations and visualizations.
Always verify and normalize data to avoid base rate bias and misleading conclusions.
Consider both aggregating and disaggregating data to reveal different insights in visualizations.

454: Biggest Bond Crash in 150 Years, Microsoft + Activision, ARM, Wikipedia, Jimmy Buffett, Semiconductors, and Tokyo Digital Twin

Liberty’s Highlights • 471 implied HN points • 16 Oct 23

💰 Finance Investing Markets Economics Statistical Analysis Trends

The biggest bond market rout in 150 years is happening now.
Once-in-a-century events appear more frequently due to statistical misunderstanding, evolving baselines, and increased detection and reporting.
Statistical probabilities can explain why 'rare' events seem to be happening more often in recent times.

Stealing Signals, Week 8, Part 1

Stealing Signals • 439 implied HN points • 31 Oct 23

🎾 Sports Analytics Film analysis Statistical Analysis Data interpretation

Teams may not always give 100% effort every game in the NFL due to strategic reasons.
Watching games can give a big advantage in fantasy football over just looking at stats.
First-read targets dataset may not accurately reflect offensive intentions in play calling and should be analyzed cautiously.

How To Run An A/B Testing On Low Traffic - Issue 181

Data Analysis Journal • 137 implied HN points • 10 Jan 24

🕹 Technology Data science Analytics A/B Testing Experimentation Statistical Analysis

No specific rules on when to start A/B testing or the minimum user numbers required.
Consider adjusting thresholds when experimenting with small sample sizes.
Address factors like confidence levels and test timelines for effective decision-making.

Data Science Weekly - Issue 535

Data Science Weekly Newsletter • 99 implied HN points • 23 Feb 24

🕹 Technology Data science Artificial Intelligence Machine Learning Software Engineering Data Engineering Statistical Analysis

Scaling AI tools like ChatGPT involves overcoming many engineering challenges to handle large user demands. It's important to manage growth effectively to keep users satisfied.
There's a lot of information out there about generative AI, making it hard to keep up. A guidebook can help condense this information and provide practical insights.
Linear regression is still a valuable tool in data science. Sometimes going back to basics can yield better results than relying on complex models.

How Intelligent is the Average Rationalist?

sebjenseb • 157 implied HN points • 03 Jul 23

🔬 Science Research Data Analysis Statistical Analysis Psychometrics

The average IQ of rationalists may not be as high as self-reported values suggest, with estimates pointing to an average IQ between 125-130.
Analysis of SAT and IQ scores of rationalists indicates an estimated average IQ of about 133.6 after accounting for biases.
Educational attainment and plausible assumptions suggest the average IQ of internet rationalists is between 125-130, considering selection for educational attainment.

Data Science Weekly - Issue 497

Data Science Weekly Newsletter • 199 implied HN points • 02 Jun 23

🕹 Technology Data science Machine Learning Artificial Intelligence Software Engineering Statistical Analysis

Data drift doesn't always hurt model performance, so it's important to analyze the context before reacting to it.
Work on solving bigger problems as you grow in your career, instead of waiting for difficult tasks to be handed to you.
To improve a model's reasoning skills, reward it for each correct step in problem-solving, not just the final answer.

Improbable Results: More votes than voters and garbage inputs

The End(s) of Argument • 19 implied HN points • 08 May 24

🇺🇸 U.S. Politics Misinformation Statistical Analysis

It's important to verify the accuracy of numbers before using them for calculations or claims.
Improbable results claims in elections often rely on misleading statistics or inaccurate information.
Just 'doing the math' isn't reliable if the numbers provided are incorrect.

Czech Republic data: vaccinated women are 66% less likely to give birth compared to unvaccinated women

Steve Kirsch's newsletter • 12 implied HN points • 01 Feb 25

🏥 Health Politics Public Health Vaccination Maternal health Statistical Analysis Policy Critique

In the Czech Republic, vaccinated women are giving birth 66% less often than unvaccinated women. This is a sharp decline in birth rates.
Despite the concerning data, the government isn't addressing it publicly and claims it's a normal trend for birth rates to fall.
In the US, health officials still recommend COVID vaccines for pregnant women, even while evidence shows a significant difference in birth rates between vaccinated and unvaccinated women.

A new visualization software for biotech

LatchBio • 6 implied HN points • 08 Nov 24

🕹 Technology Biotech Software Data Integration Collaboration Statistical Analysis

Biologists need better tools to work with their data, focusing on integration, transparency, and collaboration. Old software often doesn't meet these needs.
Latch Plots is a new software that allows scientists to easily bring in data from various sources and customize their analyses without coding skills. It makes working with data more efficient and user-friendly.
This software also supports developers by allowing them flexibility in coding while enabling scientists to create standardized templates, making teamwork and data visualization much smoother.

p < 0.05 considered harmful

Simplicity is SOTA • 122 HN points • 10 Apr 23

🔬 Science Experimentation Statistical Analysis Decision-making Tech industry

The standard use of p < 0.05 as a threshold in experiment analysis may not be as useful as commonly believed.
The choice of p < 0.05 as a significance level in experiments is a default that was set nearly a century ago.
In the tech industry, where the goal is to find real product improvements, the risk of false negatives should also be carefully considered, not just false positives.

Explaining the Surprising Rebound in Consumer Confidence

The People's Economist with Anthony Chan • 19 implied HN points • 22 Jan 24

💰 Finance Consumer confidence Inflation Statistical Analysis Global Economy

Consumer sentiment is influenced by consumer prices and inflation rates.
There is a negative correlation between consumer sentiment and yearly rise in consumer prices.
The recent rebound in consumer confidence suggests a shift towards optimism.

February 2025, Week-1

The Parlour • 4 implied HN points • 05 Feb 25

💰 Finance Machine Learning Risk management Statistical Analysis

The study on Network Linear Covariance Models shows that using GNAR models can help better predict stock price movements in the S&P 500, especially during busy trading times.
Agent-Based Modelling is a new method introduced to simulate financial markets, which can help us understand market behavior more clearly.
These research efforts highlight how machine learning techniques can be applied to finance, providing insights that can improve trading strategies.

Santa Clara County non-COVID all-cause mortality increased 50% over baseline in elderly in Q1 of 2021

Steve Kirsch's newsletter • 7 implied HN points • 02 Dec 24

🏥 Health Politics Public Health Vaccination Mortality Rates Elderly Care Statistical Analysis

In Santa Clara County, elderly non-COVID deaths rose by 50% in early 2021, a significant increase compared to previous years. This data points to a concerning spike in mortality rates during the rollout of COVID vaccines.
The health department did not explain the increase in deaths, which raises questions about the safety of the vaccines for older adults. Many believe that the COVID vaccinations might be linked to these higher death rates.
Given the unexpected rise in non-COVID deaths, experts suggest halting vaccine recommendations for the elderly until a clearer understanding of the causes can be established. This is a cautious approach to ensure the safety of older populations.

Learn from Experiences of Experts - Running Trustworthy A/B Test

Machine Learning Diaries • 7 implied HN points • 27 Nov 24

🕹 Technology Machine Learning Data science Statistical Analysis Experimental Design

A/B tests are important for businesses because they help test ideas and make informed decisions. Many companies have seen significant revenue increases by using A/B tests.
It's crucial to define the right performance metrics for A/B tests to ensure long-term success. Focus on metrics that show real customer engagement, not just short-term results.
Pay close attention to statistical principles when running A/B tests. Misunderstanding p-values and making hasty conclusions can lead to incorrect results and poor decisions.

New paper shows COVID boosters increased mortality in nursing home residents. The effect was highly statistically significant after 4 weeks.

Steve Kirsch's newsletter • 8 implied HN points • 18 Oct 24

🏥 Health Politics Public Health Vaccination Epidemiology Policy Analysis Statistical Analysis

COVID boosters seem to increase death rates in nursing home residents, especially after four weeks. This suggests the boosters might be doing more harm than good.
Initial vaccinations showed a tiny benefit, but it quickly faded and was not strong enough to justify the ongoing use of vaccines in nursing homes.
Vaccinating nursing home staff appeared to negatively affect residents, leading to higher deaths. This data raises serious concerns about the overall effectiveness of these vaccines.

Data Science Weekly - Issue 440

Data Science Weekly Newsletter • 19 implied HN points • 28 Apr 22

🕹 Technology AI Machine Learning Data science Software Development Statistical Analysis

AI is getting smarter, but we need a better way to understand how it makes decisions. A common language with AI could help us communicate our questions and concerns.
Creating more synthetic data can help when there's not enough real data for training models. Techniques like data augmentation can help make our data better.
Making data more accessible can solve big problems for society. If we can use available data properly, it can lead to more health and happiness for everyone.

Data Science Weekly - Issue 439

Data Science Weekly Newsletter • 19 implied HN points • 21 Apr 22

🕹 Technology Data science Machine Learning AI Data Visualization Statistical Analysis

Building recommendation systems requires careful planning and quick processing to handle live requests effectively. It's not just about creating a model but also about deploying it at scale.
Contrastive learning is a powerful technique in machine learning that helps in improving model performance. New insights in this area can lead to better model training and application.
Understanding different probability distributions is crucial in data science. It helps in modeling data accurately and predicting outcomes better.

Data Science Weekly - Issue 435

Data Science Weekly Newsletter • 19 implied HN points • 24 Mar 22

🕹 Technology Data science Machine Learning Artificial Intelligence Healthcare Technology Statistical Analysis

Algorithmic assessments can help ensure that healthcare technology benefits everyone involved. It's important to evaluate how data is used in these systems.
Relying solely on deep learning for electronic medical records may not be the best idea right now. Instead, better IT support is needed to improve healthcare systems.
Many claims about explaining AI technology are misleading. Experts agree that what we currently call 'explainable AI' often falls short of being truly understandable.

Data Science Weekly - Issue 390

Data Science Weekly Newsletter • 19 implied HN points • 13 May 21

🕹 Technology Data science Machine Learning AI Software Engineering Statistical Analysis

A crossword-solving AI named Dr. Fill has shown that machines can solve puzzles like humans, but humans still have their unique strengths.
The concept of 'trees' in biology is more complex, as many plants we call trees don't fit a simple definition, mixing in non-trees in their evolutionary history.
Advancements in synthetic data generation allow for the creation of realistic images, making it useful for training models even when real data is scarce.

Data Science Weekly - Issue 376

Data Science Weekly Newsletter • 19 implied HN points • 04 Feb 21

🕹 Technology Data science Machine Learning Statistical Analysis Software Development

Data quality is super important for AI, especially in high-stakes situations like medical diagnoses. Poor data can lead to serious mistakes in predictions.
DanNet revolutionized deep learning by being the first successful deep CNN in competitions. Its success marked a turning point in computer vision.
Cohort analysis is a powerful way to examine customer data over time, helping businesses improve their user engagement and marketing strategies.

Data Science Weekly - Issue 371

Data Science Weekly Newsletter • 19 implied HN points • 31 Dec 20

🕹 Technology Machine Learning Data science AI Research Technology Tools Statistical Analysis

Real-time machine learning is becoming important for many companies. Some have invested heavily in the right infrastructure and are seeing good results.
There are many new tools for machine learning and MLOps. Keeping track of these tools can help in improving workflow and project success.
Understanding concepts like Markov models can help in planning routines, such as workouts, based on previous choices. This helps in making smart decisions about what to do next.

Data Science Weekly - Issue 277

Data Science Weekly Newsletter • 19 implied HN points • 14 Mar 19

🕹 Technology Data science AI Research Programming Machine Learning Statistical Analysis

Data science teams perform better with generalists instead of specialists. This approach helps teams adapt and innovate rather than just focusing on increasing productivity.
R is a powerful programming language for data analysis, with many surprising capabilities beyond statistics. It has features that can impress even those in the computer science field.
China is expected to surpass the U.S. in AI research output soon. This shift highlights the increasing importance of global competition in technology and research.

Data Science Weekly - Issue 238

Data Science Weekly Newsletter • 19 implied HN points • 14 Jun 18

🕹 Technology Machine Learning Data science AI Development Statistical Analysis Data Visualization

Neural networks can struggle to tell jokes if they don't have enough examples to learn from. Giving them more data might help improve their humor.
Machine learning is becoming more efficient with smaller, low-power chips, which could solve many current problems. This trend is expected to grow in the future.
Data cleaning takes a lot of time in data science, with up to 80% of the effort spent on it. Learning tools like Python's Pandas can really help with this task.

Data Science Weekly - Issue 235

Data Science Weekly Newsletter • 19 implied HN points • 24 May 18

🕹 Technology Data science Machine Learning Artificial Intelligence Neural Networks Statistical Analysis

Deep learning models are making it easier to categorize images, like those used in Airbnb listings.
New research suggests that the brain may store information in a discrete way, which could change our understanding of brain and technology interactions.
There are many resources available for learning data science, including online programs and tutorials that cover various tools and techniques.

Data Science Weekly - Issue 190

Data Science Weekly Newsletter • 19 implied HN points • 13 Jul 17

🕹 Technology Data science Machine Learning Artificial Intelligence Software Development Statistical Analysis

Technical debt in machine learning can build up quickly and affect project timelines. Even skilled teams might struggle to manage it and can face major setbacks.
The role of a data product manager is becoming important as companies rely more on data. This new position will be vital for guiding product decisions based on data insights.
Using deep learning models can significantly improve tasks like diagnosing health conditions from data, often outperforming specialists in accuracy.

Data Science Weekly - Issue 144

Data Science Weekly Newsletter • 19 implied HN points • 25 Aug 16

🕹 Technology Data science Machine Learning Artificial Intelligence Programming Data Visualization Statistical Analysis

Neural networks are inspired by how our brain's neurons work and help simulate intelligent behavior. They have a long history and have evolved significantly over time.
Counting can be surprisingly difficult in data science, often requiring more effort than expected. Even experienced data scientists face challenges with counting tasks.
Data-driven decision making is important, but we must be cautious. Ignoring the nuances can lead to pitfalls, so it's crucial to stay aware and informed.

Data Science Weekly - Issue 112

Data Science Weekly Newsletter • 19 implied HN points • 14 Jan 16

🕹 Technology Data science AI Machine Learning Neural Networks Deep Learning Statistical Analysis

The value of information is important in decision-making. Knowing how much to pay for good information can help you make better choices.
AI is getting better at understanding humor. It was thought machines couldn't grasp humor, but advancements are changing that view.
Participating in hackathons can fast-track your learning. Working with others on projects can teach you more than studying alone for months.

Data Science Weekly - Issue 111

Data Science Weekly Newsletter • 19 implied HN points • 07 Jan 16

🕹 Technology Data science Machine Learning Algorithms Statistical Analysis Artificial Intelligence

Using machine learning can create fun things, like generating levels for video games. It's a cool way to combine tech and entertainment.
Too much agreement in a decision-making process can sometimes indicate problems. It’s important to question even unanimous decisions to avoid errors.
Understanding different algorithms behind systems like Netflix's recommendations can help us see the business value of data science. It shows how data can drive decisions in companies.

Data Science Weekly - Issue 88

Data Science Weekly Newsletter • 19 implied HN points • 30 Jul 15

🕹 Technology Data science Machine Learning Programming Statistical Analysis Artificial Intelligence

Hadley Wickham is a famous statistician known for his work with R, a programming language. He has made a big impact in the stats community, and people admire his contributions.
Computers are moving beyond just calculations; they can now assess human character. This development raises questions about how we see technology's role in our lives.
The concept of Dropout is key in modern neural networks, and there are simple ways to implement it in Python. Learning this can help improve machine learning projects.

Axial Discovery - Clinical trial statistical analysis

Discovery by Axial • 1 implied HN point • 08 Sep 23

🔬 Science Clinical Trials Statistical Analysis Software Development Data Integration Data Visualization

Clinical trial statistical analysis involves collecting and interpreting data to evaluate new treatments.
Startups have opportunities to develop software for automating and streamlining statistical analysis processes due to increasing data complexity.
Software development for data integration, visualization, and communication can improve efficiency in clinical trial statistical analysis.

Data Science Weekly - Issue 51

Data Science Weekly Newsletter • 19 implied HN points • 13 Nov 14

🕹 Technology Data science Machine Learning Artificial Intelligence Big Data Statistical Analysis

Data science often blends different fields like statistics and machine learning. This combination helps us solve complex problems and make better predictions.
Understanding both text and images is key to getting a complete view of information. Analyzing them together gives us a clearer picture of reality.
There's a strong demand for data scientists, and many companies struggle to find qualified candidates. This shows how important this skill set is becoming in today's job market.