The hottest Data Analysis Substack posts right now

And their main takeaways

Mohamed Salah is irreplaceable. Let's use data to replace him, anyway.

Grace on Football • 687 implied HN points • 23 Jan 24

Mohamed Salah's exceptional performance and consistency make him irreplaceable on the field.
Using data and statistical analysis can help identify potential players to fill Salah's role.
To replace Salah, focus on finding players who can contribute in areas like goal-scoring, creativity, and left-footedness to maintain team balance.

So you're interested in the national IQ of a country, what do you do?

Just Emil Kirkegaard Things • 648 implied HN points • 31 Jan 24

🔬 Science Research Data Analysis

Check the data behind a national IQ estimate using Becker's 2019 database.
Iran's national IQ of around 84 seems accurate given its development level matches expectations.
Iran's strong performance in math Olympiads may be attributed to talent searches by the government, rather than a large smart fraction.

Issue #10 - The Data Lifecycle

The Data Ecosystem • 159 implied HN points • 16 Jun 24

🕹 Technology Data science Data Management Data security Data Analysis Data Engineering Data Visualization

The data lifecycle includes all the steps from when data is created until it is no longer needed. This helps organizations understand how to manage and use their data effectively.
Different people and companies might describe the data lifecycle in slightly different ways, which can be confusing. It's important to have a clear understanding of what each term means in context.
Properly managing data involves stages like storage, analysis, and even disposal or archiving. This ensures data remains useful and complies with regulations.

The Open Source Stack for Biological Imaging

LatchBio • 17 implied HN points • 29 Jan 25

🔬 Science Data Analysis Open Source

There are many open-source tools for biological imaging like Napari, ImageJ, Cellpose, CellProfiler, and Suite2p. Each tool has unique features and helps scientists visualize and analyze complex biological data.
Using these tools, scientists can perform tasks such as tracking embryo development, analyzing protein interactions, segmenting cells, and studying neural activity. This technology makes research more efficient and accurate.
Modern data infrastructure can greatly improve the use of these imaging tools. Centralizing resources, using container templates, and optimizing data transfer enhances research productivity and collaboration among teams.

Case Study: Scaling customer intelligence

Artificial Ignorance • 117 implied HN points • 27 Nov 24

💼 Business Sales Marketing AI Data Analysis Customer Insights

AI can help analyze a large number of sales calls quickly instead of relying on humans to do it manually. This makes it easier to understand customer behaviors and needs.
Choosing the right AI model is important. Higher quality models may cost more, but they can provide better and more accurate results over cheaper options.
It’s essential to make the data user-friendly. Organizing and making information accessible helps teams use insights from the analysis effectively.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Issue #9 - Clarifying Data Terminology

The Data Ecosystem • 159 implied HN points • 09 Jun 24

🕹 Technology Data Management Data Analysis Data Governance Data Literacy Data science

Data can mean many things, from raw collections to curated evidence used in decisions. It's important to define what data means in each situation to avoid confusion.
Poorly defined data terms can lead to problems in data literacy, collection, and management. This can create issues for organizations trying to use data effectively.
Understanding different categories of data, like data types and processing stages, helps in managing and analyzing data better. Knowing these categories makes it easier to communicate and use data in an organization.

Thank You for 100k: A Journey Through 2024 and Beyond

SeattleDataGuy’s Newsletter • 294 implied HN points • 31 Dec 24

💼 Business Consulting Data Analysis Content creation Leadership Community Building

In 2024, I gained over 100,000 subscribers on both YouTube and Substack. I really appreciate the support and plan to create even better content next year.
This year showed trends like cloud data migrations and smaller, fractional data teams, which are changing how companies handle data. It's important to keep an eye on these shifts in the data world.
Looking ahead to 2025, I want to finish my book on data leadership and offer more webinars and mini-courses. I'm excited to engage even more with my readers and build a community.

Metrics Are The Map, Not The Territory

Push to Prod • 59 implied HN points • 30 Jul 24

🕹 Technology Metrics Systems Data Analysis Software Performance

Metrics give us a view of our systems, but they won't show the complete picture. It's like looking at a map; it can guide us but doesn't capture all the details.
When we check the data, we might miss important moments because of how we sample information. This can lead to misunderstandings about our system's performance.
Understanding that metrics are imperfect helps us make better decisions. We should use them to create theories, not think they tell us everything.

Why Data Teams Need to Understand Metrics: A Look at Starbucks’ Comparable Store Sales

SeattleDataGuy’s Newsletter • 447 implied HN points • 08 Nov 24

💼 Business Metrics Data Analysis Corporate strategy Revenue management

Data teams need to know the main numbers that matter for their business. This helps them understand how the company is performing.
High-level metrics like revenue and expenses can seem too big to grasp. Breaking these down into smaller parts makes them easier to understand.
These smaller, detailed metrics can reveal valuable insights that affect decisions and strategies for the business.

Year-on-year COVID comparisons

Logging the World • 1056 implied HN points • 01 Oct 23

🏥 Health Politics COVID-19 Data Analysis Public Health Vaccination Epidemiology

Overall, COVID admissions and death rates in 2023 are lower than the corresponding days in 2022, suggesting positive progress in managing the virus.
Comparisons of primary beds occupied 'for COVID' show similar positive trends in 2023 compared to 2022, indicating improved conditions.
The data suggests that in 2023, COVID outcomes have improved significantly compared to previous years, with lower deaths and better management, showcasing progress in handling the pandemic.

Thus ends my Twitter addiction

Alex's Personal Blog • 32 implied HN points • 11 Jan 25

🕹 Technology Social media AI Impact Digital Communication Data Analysis

The author has become less fond of Twitter due to its negative impact on news gathering, especially during events like the LA fires. They now prefer niche subreddits for information.
AI is causing tech companies to stop hiring new staff as they optimize productivity with technology, which may affect job growth in the industry.
The slowdown in hiring at big tech companies could lower the value of talent in acquihire situations, affecting startup exit strategies.

Can AI learn language like we do?

AI Supremacy • 491 implied HN points • 09 Feb 24

🔬 Science AI Child development Neural Networks Data Analysis

An AI model was trained using video footage from a baby to learn language and concepts.
The AI model demonstrated the ability to link words to their visual counterparts based on limited real-world experiences.
This study could help reshape our understanding of how AI and humans learn language and concepts.

Mantic Monday 4/24/23

Astral Codex Ten • 3923 implied HN points • 25 Apr 23

🕹 Technology AI Data Analysis API Funding

Using AI for forecasting future world events is a growing field with potential benefits over human forecasters.
Metaculus has been found to be more accurate than low-information priors and its competitor Manifold Markets, showing the potential of crowdsourcing for predictions.
Exploring AI forecasting through platforms like Polymarket, Metaculus, and Manifold provides insight into trends, such as the interest in prediction markets among sci-tech celebrities.

Does birth order matter?

Wyclif's Dust • 2146 implied HN points • 11 Nov 23

🔬 Science Genetics Environment Family Dynamics Research Methods Data Analysis

Birth order and parental age influence outcomes in opposite ways.
Within families, birth order and parental age have a high correlation.
Even though birth order effects are big, they explain very little of the variation in outcomes.

Re-Examining Opponent Breakdowns

MatchQuarters • 452 implied HN points • 05 Feb 24

🎾 Sports Data Analysis

Defensive coordinators should approach game planning holistically instead of solely relying on numbers.
Offenses aim to create space, while defenses work to constrain it.
Simplify the process of breaking down opponents by focusing on key formations, movements, and plays to develop a comprehensive game plan.

Everything You Wanted to Know About 2️⃣0️⃣2️⃣3️⃣ (and weren't shy to ask!)

Africa: The Big Deal • 511 implied HN points • 18 Jan 24

💼 Business Start-ups Data Analysis Social media Networking

The post provides key analysis on 2023 start-ups funding trends in Africa.
Content includes high-res slides for use in decks or social media.
A deals database is available with almost 3,000 deals worth $16.5 billion.

Neural Network Fail Compilation

Top Carbon Chauvinist • 59 implied HN points • 21 Jul 24

🕹 Technology AI Machine Learning Data Analysis Automation Software Development

AI systems, like large language models, struggle with reasoning and can often give wrong answers to simple questions. They rely on patterns rather than true understanding.
Generative AI can produce flawed code and lead to increased mistakes in programming. This raises concerns about the overall quality and security of software.
AI tools can create misleading or totally false news articles. Their results can be unreliable, which poses risks when using them for information or news reporting.

The Youth Mental Health Crisis is International, Unless You Rely on a Flawed International Dataset

After Babel • 1118 implied HN points • 03 Jan 24

🏥 Health & Wellness Mental health Data Analysis Youth Trends Misinformation

Researchers should stop using the Global Burden of Disease study for analyzing mental health trends.
The youth mental health crisis is not just limited to America, but is an international issue in many Western countries with high levels of smartphone adoption.
The Global Burden of Disease study underestimates changes in mental health statistics since 2010, especially in depression, self-harm, and suicide rates.

My Parkinson's data - at last!

Rory’s Always On Newsletter • 1368 implied HN points • 12 Jul 23

🏥 Health & Wellness Parkinson's Data Analysis Treatment Monitoring Health tech

The author has been seeking hard data about his Parkinson's symptoms to understand their severity and response to medication.
Monitoring technology like PD Monitor can provide detailed insights into symptom presence and medication effectiveness over time.
The data revealed that the effectiveness of the author's medication peaks before 11am, making it clear that eating a big breakfast close to pill time can impact absorption.

How to build your first LLM evaluation

The AI Frontier • 159 implied HN points • 16 May 24

🕹 Technology AI Machine Learning Data Analysis Software Development

AI needs to show real value to its customers, which means proving it can create real profits. Without this, it’s hard to justify the excitement around AI.
To understand how well AI products perform, it’s important to create custom evaluations that target specific goals. Generic measurements like MMLU don't provide useful insights for particular applications.
Improving AI evaluations is a continuous process that requires careful scoring and can benefit from community feedback. It's crucial to identify weaknesses and refine metrics for more accurate assessments.

Do OpenAI's New Reasoning Models (o1 Series) Differ Politically from Their Predecessors?

Rozado’s Visual Analytics • 150 implied HN points • 28 Jan 25

🕹 Technology AI Models Data Analysis Reasoning Political Bias Model development

OpenAI's new o1 models are designed to solve problems better by thinking through their answers first. However, they are much slower and cost more to run than previous models.
The political preferences of these new models are similar to earlier versions, despite the new reasoning abilities. This means they still lean left when answering political questions.
Even with their advanced reasoning, these models didn't change their political views, which leads to questions about how reasoning and political bias work together in AI.

The KQL Mysteries: Chapter 5

Rod’s Blog • 456 implied HN points • 18 Jan 24

🕹 Technology Cybersecurity Data Analysis Hacking AI Threat intelligence

Jon and Sofia successfully identified and captured the teenage threat actors behind a financial breach using KQL queries and OSINT techniques.
The threat actors were operating from a suburban house in Seattle, Washington, and were quickly apprehended by authorities, leading to the recovery of the funds.
Despite the success, Jon remains suspicious about the involvement of the Night Princess hacker group, hinting at a potential unresolved mystery for the next chapter.

Correction: 2024 Cybersecurity Investments hit $16.1 Billion!

The Security Industry • 25 implied HN points • 03 Jan 25

🕹 Technology Cybersecurity Investments Data Analysis Market Trends Business Growth

In 2024, investments in cybersecurity reached an impressive $16.1 billion, which is a big jump of 60% from the previous year.
A total of 432 cybersecurity companies received funding, with many rounds exceeding $100 million, showing strong interest in the industry.
Looking ahead, experts believe that funding in 2025 could surpass 2024, indicating a growing demand for tech and security services.

The KQL Mysteries: Chapter 6

Rod’s Blog • 416 implied HN points • 22 Jan 24

🕹 Technology Cybersecurity Data Analysis Network Security

Jon discovers that the Night Princess was behind the cyber-attacks on his company, manipulating data, planting false clues, and covering her tracks.
Jon uses KQL skills to investigate the Night Princess's activities by analyzing logon events and network events in the company's database.
Collaboration between the Night Princess, CyberGhost, and DarkAngel in the cyber-attacks surfaces, raising questions about the Night Princess's identity and motives.

How to Use AI to Do Stuff: An Opinionated Guide

One Useful Thing • 1801 implied HN points • 15 Jul 23

🕹 Technology AI Machine Learning Programming Data Analysis Education

Increasingly powerful AI systems are being released rapidly without proper user documentation.
The major Large Language Models in use currently are GPT-3.5, GPT-4, Bard, Pi, and Claude 2.
AI can assist with writing, generating images, coming up with ideas, making videos, and working with documents and data, but users must be cautious of biases and ethical concerns.

The KQL Mysteries: Chapter 3

Rod’s Blog • 456 implied HN points • 05 Jan 24

🕹 Technology Cybersecurity Data Analysis Threat intelligence Cryptocurrency Malware

Jon and Sofia's financial accounts were compromised by hackers, leading them to investigate the breach and work towards recovering the stolen funds.
Through KQL queries and Microsoft Sentinel workspace, Jon and Sofia uncovered details about the malware used in the cyberattack and the group of threat actors behind it.
Jon and Sofia utilized Microsoft Defender Threat Intelligence and various online resources to track the remote servers, cryptocurrency wallets, and patterns involved in the financial heist, narrowing down their search for the threat actors.

Under The Clearance Rate Data Hood

Jeff-alytics • 412 implied HN points • 15 Jan 24

🇺🇸 U.S. Politics Data Analysis Crime Rates

Reported clearance rates for all crimes have fallen since 2019.
Clearance rates significantly dropped in 2020, 2021, and 2022, affecting big cities and suburbs.
Accessing clearance rate data from different years can be challenging due to varied reporting methods and data constraints.

Daily Chartbook #268

Daily Chartbook • 1493 implied HN points • 29 Aug 23

💰 Finance Economy Investing Markets Stocks Data Analysis

Tender rejections in the freight market are at a 6-month high.
High yield default rate could reach 6% due to the strong uptrend in defaults.
Households' interest payments are increasing as a share of disposable income.

#8: How to Blow $250 on Ads For Your Self-Published Book

Shades of Greaves • 412 implied HN points • 12 Jan 24

💼 Business Self-Publishing Advertising Book Sales Marketing Strategies Data Analysis

The author tried investing $250 in ads for their self-published book but didn't see good returns, highlighting the challenges of advertising for self-published authors.
Despite spending on ads, the author sold very few copies through Facebook and Amazon, underscoring the risk of not getting desired results from advertising efforts.
Data from the failed ad campaigns is seen by the author as a way to learn and refine future advertising strategies, showing the importance of using past experiences to improve future marketing efforts.

AI SOC Automation isn't the right problem to solve

Frankly Speaking • 152 implied HN points • 14 Jan 25

🕹 Technology AI Cybersecurity Information Security Data Analysis Automation

Focusing on better detection engineering is key in security operations. It helps identify threats more effectively rather than just automating processes.
Many traditional security operations centers (SOCs) may not be necessary for most companies. Smaller, more efficient models or managed detection services can be better alternatives.
The future of SOCs is likely to involve fewer human analysts and more automation, emphasizing custom detections that fit the specific needs of a business.

Life is Getting Better: Why Don’t We Believe it?

Extropic Thoughts • 373 implied HN points • 20 Jan 24

🚌 Education Trends Beliefs Improvements Data Analysis

Many people believe life is getting worse, despite evidence showing improvement over time.
Negative views about the present and future can hinder progress and lead to costly policy decisions.
The media's focus on negative news, combined with human psychology, contributes to unrealistically gloomy beliefs about the world.

Is the Fed data-driven or data-ridden?

Stay-At-Home Macro (SAHM) • 668 implied HN points • 09 May 23

💰 Finance Monetary Policy Inflation Data Analysis

The Fed's decisions are influenced by data but ultimately rely on judgment.
It's important to not let data-driven approaches overlook crucial data points.
Outsiders may misinterpret when the Fed says it is data-driven, leading to confusion in markets.

The KQL Mysteries: Chapter 4

Rod’s Blog • 396 implied HN points • 09 Jan 24

🕹 Technology Cybersecurity Cryptocurrency Data Analysis

Jon and Sofia used KQL queries and tools like Microsoft Defender Threat Intelligence to track down threat actors behind a financial breach, targeting remote servers and the master wallet separately.
Jon discovered malicious activities on servers using methods like port scanning and DNS spoofing, eventually finding a network of servers communicating over Tor.
Sofia tracked cryptocurrency transactions and wallets, identifying techniques like CoinJoin and stealth addresses, and used tools like Chainalysis to follow the money trail.

AI in DeFi

DeFi Education • 499 implied HN points • 29 Nov 23

🔮 Crypto DeFi AI Blockchain Smart Contracts Data Analysis

Large Language Models (LLMs) are making it easier for people without coding skills to interact with the DeFi space. Now, you can ask questions and get quick responses without needing to be a tech expert.
AI can help enhance the security of DeFi by automating smart contract audits and identifying vulnerabilities. This means it can make DeFi safer, but there’s also a risk that hackers might use AI for malicious purposes.
LLMs can streamline tasks like monitoring Discord communities by filtering out spam and detecting issues. This could make managing online crypto communities much more efficient.

5 Lessons from Stanford's COVID Conference

Vinay Prasad's Observations and Thoughts • 129 implied HN points • 06 Oct 24

🏥 Health Politics Pandemic response Public Health Education & Policy Mental health Data Analysis

Closing elementary schools during the pandemic may have been a bad idea because kids were not significant spreaders of COVID-19. Some experts, like Anders Tegnell from Sweden, believed this from the start.
Many people now agree that long school closures were harmful, but some didn't speak up about it at the time. It shows the importance of questioning popular opinions instead of just following the crowd.
Countries that had less income inequality tended to handle the pandemic better than those with more inequality. Access to basic healthcare might have played a bigger role than strict lockdowns or border closures.

A Selection Of SQL Tutorials - Issue 141

Data Analysis Journal • 628 implied HN points • 26 Apr 23

🕹 Technology Data Analysis Interview Preparation

SQL is a must-have skill in the data field today for increasing trust in data and data cleanliness.
Learning SQL through free tutorials and practice sites is effective and recommended over expensive programs.
Practicing SQL in a local database, using tools like SQLFiddle, and following online tutorials are great ways to improve SQL skills.

What the media won't tell you about ... Wildfires

The Honest Broker Newsletter • 2227 implied HN points • 08 Jun 23

🔬 Science Climate Data Analysis Climate Policy

The IPCC does not attribute fire occurrence to human-caused climate change.
Global emissions from wildfires have decreased, but some regions have seen an increase.
Understanding the complexities and local contexts of wildfires is crucial for effective management.

Machine learning never cheats but it may play flawed games

Mindful Modeler • 259 implied HN points • 27 Feb 24

🕹 Technology Machine Learning Data Analysis Statistics Data generation

Machine learning models may use shortcuts or exploit quirks in data, but it's important to consider them as playing the game according to the rules set by the data.
Detecting flaws in prediction games is crucial, as models can unintentionally learn and act on misleading information from the data.
Designing prediction games effectively requires a deep understanding of the data-generating process, tools like sampling theory, design of experiments, and a statistical mindset can be valuable in shaping prediction tasks.

A data-driven insight into German transfer fraud: 7.594 cases

Altay's Blog • 1 HN point • 30 Sep 24

💰 Finance Fraud Banking E-commerce Scams Data Analysis

Many people in Germany lose money to transfer fraud each year because scammers trick them into thinking their payments are safe. They use methods like fake online shops to steal money without delivering any products.
Scammers often use tricks to hide their identities, like opening bank accounts under fake names or recruiting unsuspecting people to help. These tactics make it hard for banks to catch them right away.
There are rules called Know-Your-Customer (KYC) that banks must follow to verify customer identities. When these rules are not strong, it can lead to more fraud, but better KYC practices can help reduce these scams.

The hottest Data Analysis Substack posts right now

Grace on Football • 687 implied HN points • 23 Jan 24

Just Emil Kirkegaard Things • 648 implied HN points • 31 Jan 24

The Data Ecosystem • 159 implied HN points • 16 Jun 24

LatchBio • 17 implied HN points • 29 Jan 25

Artificial Ignorance • 117 implied HN points • 27 Nov 24

The Data Ecosystem • 159 implied HN points • 09 Jun 24

SeattleDataGuy’s Newsletter • 294 implied HN points • 31 Dec 24

Push to Prod • 59 implied HN points • 30 Jul 24

SeattleDataGuy’s Newsletter • 447 implied HN points • 08 Nov 24

Logging the World • 1056 implied HN points • 01 Oct 23

Alex's Personal Blog • 32 implied HN points • 11 Jan 25

AI Supremacy • 491 implied HN points • 09 Feb 24

Astral Codex Ten • 3923 implied HN points • 25 Apr 23

Wyclif's Dust • 2146 implied HN points • 11 Nov 23

MatchQuarters • 452 implied HN points • 05 Feb 24

Africa: The Big Deal • 511 implied HN points • 18 Jan 24

Top Carbon Chauvinist • 59 implied HN points • 21 Jul 24

After Babel • 1118 implied HN points • 03 Jan 24

Rory’s Always On Newsletter • 1368 implied HN points • 12 Jul 23

Chartbook • 400 implied HN points • 21 Oct 24

The AI Frontier • 159 implied HN points • 16 May 24

Rozado’s Visual Analytics • 150 implied HN points • 28 Jan 25

Rod’s Blog • 456 implied HN points • 18 Jan 24

The Security Industry • 25 implied HN points • 03 Jan 25

Rod’s Blog • 416 implied HN points • 22 Jan 24

One Useful Thing • 1801 implied HN points • 15 Jul 23

Rod’s Blog • 456 implied HN points • 05 Jan 24

Jeff-alytics • 412 implied HN points • 15 Jan 24

Daily Chartbook • 1493 implied HN points • 29 Aug 23

Shades of Greaves • 412 implied HN points • 12 Jan 24

Frankly Speaking • 152 implied HN points • 14 Jan 25

Extropic Thoughts • 373 implied HN points • 20 Jan 24

Stay-At-Home Macro (SAHM) • 668 implied HN points • 09 May 23

Rod’s Blog • 396 implied HN points • 09 Jan 24

DeFi Education • 499 implied HN points • 29 Nov 23

Vinay Prasad's Observations and Thoughts • 129 implied HN points • 06 Oct 24

Data Analysis Journal • 628 implied HN points • 26 Apr 23

The Honest Broker Newsletter • 2227 implied HN points • 08 Jun 23

Mindful Modeler • 259 implied HN points • 27 Feb 24

Altay's Blog • 1 HN point • 30 Sep 24