The hottest Data science Substack posts right now

And their main takeaways

AI-First in 2025: Hype, Reality, and Your Next Move

Rémi Ounadjela • 6 implied HN points • 06 Aug 25

🕹 Technology Data science

AI can be useful in many areas of a company, but it's important to choose the right tools carefully. Think about the problems you want to solve first.
There are different levels of AI tools, ranging from basic productivity helpers to complex systems that can perform tasks on their own. Each level comes with its own benefits and risks.
As you use more advanced AI tools, remember that higher risks come with higher rewards. Make sure to set up good guardrails and track how well things are working.

Staying Human in the Age of Data.

The Future Does Not Fit In The Containers Of The Past • 20 implied HN points • 15 Dec 24

💼 Business Data science

Data is important, but focusing too much on it can harm the long-term success of both businesses and people. It's crucial to balance numbers with human emotions and culture.
Leaders should encourage open discussions about tough topics and avoid wasting time in unnecessary meetings. This helps create a culture where everyone feels comfortable sharing their thoughts.
Successful companies need to remember that their employees are not just numbers. Investing in their development and well-being leads to a more motivated and productive workforce.

Landscape of Sequencing-based Spatial RNA Technology

LatchBio • 15 implied HN points • 27 Feb 25

🕹 Technology Data science

Spatial RNA technology helps us see how cells interact in their natural environment. It gives a clearer picture than traditional methods that just show gene activity without their locations.
There are many ways to capture and analyze spatial gene data, like using specially barcoded slides or microfluidic methods. Each approach has its pros and cons depending on what researchers want to study.
Advancements in technology are making it possible to analyze tiny details, like individual cells or even parts of cells. This opens new doors for understanding biology and diseases.

Mistakes from my Failed Startup in Scalping Concert Tickets

Jay's Data Stream • 23 implied HN points • 30 Oct 24

💼 Business Data science

The concert ticket market is built on false pricing, where tickets are sold for lower than their actual value. This means people often pay much more on resale markets.
Making money by reselling tickets is much harder than it seems. Success requires understanding a lot about the market and using technology to navigate tough ticketing systems.
Creating a startup in this space is complicated and needs more than just good ideas. It's about having the right infrastructure to turn those ideas into profitable actions.

2023 Wrap up

RSS DS+AI Section • 53 implied HN points • 31 Dec 23

🕹 Technology Data science

The focus for the year was 'Effective and Efficient Data Science' to highlight the critical aspects of the field beyond hype.
Various events and discussions were held throughout the year to promote best practices in Data Science.
Engagement with the community through events, surveys, and articles was emphasized to ensure diverse voices are heard in influencing policy.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Locating Machine Learning Engineers

Gradient Flow • 59 implied HN points • 27 Jan 22

🕹 Technology Data science

The role of 'machine learning engineer' has emerged as a key position for implementing data science in production, bridging the gap between data products and machine learning models.
Geographically, machine learning engineers are distributed across various regions, with companies and industries in different locations employing them.
Advances in computer hardware design, coupled with improvements in models and algorithms, are expected to significantly enhance model training efficiency.

A Guide for Building High-Quality Data Products

Inside Data by Mikkel Dengsøe • 16 implied HN points • 16 Jan 25

🕹 Technology Data science

Start by clearly defining how you will use data. This helps set the purpose for your data products.
It's important to have clear ownership of data and understand what needs testing. This makes accountability easier.
Continuously monitor and improve your data quality. Regular reviews help catch issues early and keep trust in your data.

January Newsletter

RSS DS+AI Section • 17 implied HN points • 01 Jan 25

🕹 Technology Data science

Data science and AI are rapidly evolving fields, with 2024 being a particularly exciting year for advancements. As we move into 2025, the trends and stories from last year will continue to shape the future.
Ethics in AI is a crucial topic that remains relevant, especially around issues like bias and safety. The way AI is developed and used needs careful consideration to align with human interests.
There are many practical applications and resources available for learning about data science and AI. From tutorials to real-world examples, there are plenty of opportunities to get involved and apply AI technologies.

The 10 Most Popular Posts of the Palindrome in 2025

The Palindrome • 1 implied HN point • 23 Dec 25

🚌 Education Data science

The most-read posts emphasize math and foundational CS for machine learning, covering topics like a mathematics roadmap, algorithmic analysis, graph theory, and practical skills such as coding on paper and representing graphs.
A holiday promotion offers a 30% lifetime discount on the annual paid subscription, which unlocks paid-only content and helps fund more math and machine learning material for the community.
Subscriber-count milestones will unlock community perks (mini-courses, a dedicated Manim animator, and a full-time writer), and the publication invites feedback while planning to expand and reinvest in 2026.

Newsletter #7: Advanced Prompt Engineering

Decoding Coding • 19 implied HN points • 30 Mar 23

🕹 Technology Data science

Zero-shot prompting lets a model answer questions without examples. It's useful when there's no data to guide the model.
Few-shot prompting gives the model a few examples to improve its answers. This helps the model understand the context better.
Chain-of-thought prompting breaks down complex problems into steps. It helps the model reason through tasks more effectively.

"Mechanistic interpretability" for LLMs, explained

The Counterfactual • 1 HN point • 08 Jul 24

🕹 Technology Data science

Mechanistic interpretability helps us understand how large language models (LLMs) like ChatGPT work, breaking down their 'black box' nature. This understanding is important because we need to predict and control their behavior.
Different research methods, like classifier probes and activation patching, are used to explore how components in LLMs contribute to their predictions. These techniques help researchers pinpoint which parts of the model are responsible for specific tasks.
There's a growing interest in this field, as researchers believe that knowing more about LLMs can lead to safer and more effective AI systems. Understanding how they work can help prevent issues like bias and deception.

LLaMA Fever

Sector 6 | The Newsletter of AIM • 19 implied HN points • 04 Apr 23

🕹 Technology Data science

Hugging Face recently launched Vicuna-13B, a new model based on Meta's LLaMA. It was created at a very low cost compared to similar models.
Stanford University's Alpaca was another recent launch based on LLaMA, also developed affordably. It shows that advanced AI can be accessible to more people now.
The new chatbot using Vicuna-13B is performing really well, matching ChatGPT and Bard in quality. It's also beating many other models in most tests, showing its high capability.

I’m making an AI powered scraper

serious web3 analysis • 26 implied HN points • 15 Aug 24

🕹 Technology Data science

FetchFox is an AI-powered Chrome extension that makes web scraping easy for everyone, even if you can't code. Just a few clicks allow you to gather useful data from any website.
Traditional web scraping requires programming skills and can be time-consuming. FetchFox simplifies the process, letting anyone scrape data in minutes rather than hours.
FetchFox is designed to work like a human visitor, which helps it avoid being blocked by websites. This means it can extract data more effectively than traditional methods.

The types of AI, visually explained 🤖

Year 2049 • 15 implied HN points • 16 Jan 25

🕹 Technology Data science

AI comes in different types, and it's good to know what they are. Understanding the types helps us see how AI works in our daily lives.
Machines learn to become intelligent over time, which is fascinating. This process is important to understand how AI evolves.
It's helpful to share knowledge about AI with others. Teaching friends and family can make everyone more aware of how AI impacts us.

Vesuvius Challenge Progress Prizes: November Edition

Vesuvius Challenge • 14 implied HN points • 23 Jan 25

🕹 Technology Data science

Community members contributed a lot to the Vesuvius Challenge, earning prizes for their work. This shows how teamwork can lead to great progress!
Some projects focused on improving how we visualize 3D scrolls and extracting data from images. These tools could really help researchers understand ancient texts better.
Awards are given for various types of contributions, encouraging creativity and technical skills. It’s exciting to see different approaches being recognized in the community.

DSW Slack Group Invitation Link

Data Science Weekly Newsletter • 19 implied HN points • 04 May 23

🕹 Technology Data science

There's a Slack group for those who subscribe to Data Science Weekly. It's a great place to connect and learn together.
The invite link for the Slack group is exclusive to paid subscribers, so make sure to keep it private.
The group aims to help members interact, learn, and support each other in the field of data science.

A primer on sparse autoencoders

Nick’s Substack • 1 HN point • 03 Jul 24

🕹 Technology Data science

Sparse autoencoders are tools that help us understand how language models work by breaking down their process into simpler parts. They help identify important features in the model that contribute to its outputs.
The idea of sparsity means only a few features are needed to describe something, while superposition lets a lot of different features exist in a small space. This makes learning and processing more efficient for the model.
Using sparse autoencoders opens up new ways to interact with language models. Instead of just inputting text and getting answers, we can manipulate features and explore the model's internal workings more creatively.

ChatGPT Fails UPSC Prelims

Sector 6 | The Newsletter of AIM • 19 implied HN points • 01 Mar 23

🕹 Technology Data science

ChatGPT has performed well in various exams, including MBA and medical tests, showing that it can answer many questions correctly.
However, when tested on the UPSC Prelims, ChatGPT only answered 54 out of 100 questions correctly, demonstrating its limitations.
This highlights that while AI can be smart, it might still struggle with complex and diverse challenges like tough civil service exams.

Do You Believe In Second Chances?

Sector 6 | The Newsletter of AIM • 19 implied HN points • 21 Feb 23

🕹 Technology Data science

Indian IT companies failed to automate their operations before the pandemic, but now they have a new chance with advanced AI tools. This could help them become more innovative and efficient.
The introduction of large language models, like ChatGPT, could improve how IT companies operate and serve their customers. There's a lot of potential for better efficiency.
Experts believe that using AI in IT could change many processes for the better, making companies more focused on customer needs and improving their overall performance.

The Data Science Education Consolidation

Sector 6 | The Newsletter of AIM • 39 implied HN points • 06 Jun 22

🚌 Education Data science

Edtech companies like BYJU'S and upGrad are buying smaller firms to strengthen their position in data science education. This shows a trend of growth and consolidation in the industry.
Traditional training institutions like NIIT and Aptech are struggling to keep up with these changes. They seem to be losing relevance in the fast-paced education market.
BYJU'S made a big impact last year by acquiring ten companies for $2.5 billion. This highlights the scale of investment happening in the education sector, particularly in data science.

What do Alpacas have to do with AI's future?

do clouds feel vertigo? • 19 implied HN points • 20 Mar 23

🕹 Technology Data science

AI training costs are dropping significantly, which makes it easier for more people to create their own AI models.
AI models can become more common and even borrowed from others, which leads to questions about ownership and competition.
Companies now face a choice between buying AI capabilities or building their own, affecting how they manage privacy and efficiency.

How machines learn, visually explained 🧠

Year 2049 • 13 implied HN points • 17 Jan 25

🕹 Technology Data science

AI systems learn from data, so the quality of that data is really important. Better data means smarter machines.
Machines can become biased if they are trained on biased data. It's important to watch out for this when developing AI.
This is just one part of a series explaining AI. More episodes will cover different aspects of how machines learn and behave.

E2- Basics of Large Language Models for Product Managers 🤖

The Product Channel By Sid Saladi • 16 implied HN points • 17 Nov 24

🕹 Technology Data science

Large language models (LLMs) are special AI systems that understand and generate human language. They can do things like summarize texts, translate languages, and even write codes.
LLMs are changing many industries by powering chatbots, helping create content, and giving personalized product recommendations. This makes services smarter and more helpful.
Building custom LLMs requires a lot of money and data. Companies must invest millions and gather vast amounts of information to develop effective models.

There Are Many Roads to Machine Learning

The Palindrome • 5 implied HN points • 05 Jul 25

🕹 Technology Data science

There are many ways to get into machine learning. You don't need to follow strict rules or have a specific background.
You can start with just basic math skills. High school math is enough to begin your journey in machine learning.
Whether you want to be a generalist or a specialist in machine learning, both paths are valid. Choose what fits your goals best.

Most can learn analysis, but won't become analysts

Counting Stuff • 54 implied HN points • 04 Jul 23

🕹 Technology Data science

Everyone can learn to analyze, but not everyone will make good analysts.
Analysis is fundamental and necessary in daily life and professional settings.
Being a data analyst requires juggling different domains and approaches.

E1- Introduction to General AI for Product Managers 🤖

The Product Channel By Sid Saladi • 16 implied HN points • 10 Nov 24

🕹 Technology Data science

AI is changing how products are made and used. Product managers need to understand AI to stay ahead in their industry.
There are many AI applications, like chatbots and recommendation systems, that can improve user experience. Learning about these tools can help product managers create better products.
While AI has benefits, it also brings risks like bias and job losses. It's important for product managers to think about these issues and apply AI responsibly.

Calls to actions, "like and subscribe", are (sadly) necessary

Counting Stuff • 54 implied HN points • 27 Jun 23

🕹 Technology Data science

Calls to actions like 'like and subscribe' are necessary for engagement.
It's challenging to get all users to take a desired action, even with technology.
Human behavior cannot be completely solved with technology alone.

July RSS AI and Data Science newsletter - anything to contribute?

RSS DS+AI Section • 5 implied HN points • 21 Jun 25

🕹 Technology Data science

The next AI and Data Science newsletter will be sent out in early July.
If you have anything to share, like announcements or job openings, please send it directly to the author.
Contributions are welcome from everyone in the community, so don't hesitate to participate.

AI is Racing Forward – on a Very Long Road

Am I Stronger Yet? • 15 implied HN points • 12 Nov 24

🕹 Technology Data science

AI is making rapid progress, but it is not close to achieving artificial general intelligence (AGI). Many tasks still require human capabilities, showing that there is still a long way to go.
Current AIs excel at specific tasks but struggle with complex, nuanced tasks that require extensive context or emotional intelligence, like managing a classroom or writing a novel.
While there are exciting advancements happening with AI, the journey towards true intelligence is more like crossing a vast ocean than a quick sprint, suggesting that there are many challenges ahead.

Stable Point Aware 3D, Cosmos, Autonomous game characters and Digits by Nvidia, Qwen Chat, Hailuo's Subject Reference, rStar-Math, Text-to-Video gen with Transparency, Cohere's North, STAR, & more

AI Brews • 12 implied HN points • 10 Jan 25

🕹 Technology Data science

Stability AI has released a new tool called Stable Point Aware 3D, which lets you edit 3D objects from just one image really quickly. It's free to use for everyone.
Microsoft has made its Phi-4 model open-source and introduced rStar-Math, a new technique that improves math solving in smaller language models.
Qwen Chat is a new web app allowing users to interact with various Qwen models, making it easy to compare their capabilities all in one place.

Introducing the Kirsch Cumulative Outcomes Ratio (KCOR) analysis: A powerful yet simple new technique for accurately assessing the impact of any intervention on any outcome

Steve Kirsch's newsletter • 6 implied HN points • 18 May 25

🏥 Health Politics Data science

The KCOR method is a new, simple technique to analyze how different interventions, like vaccines, affect outcomes such as mortality. It uses basic data like date of birth, date of death, and vaccination date to provide clear results.
The analysis suggests that COVID vaccines may have increased mortality rates, indicating the vaccines could be more harmful than helpful. This counters many previous claims about the vaccines saving lives.
KCOR is designed to be objective and straightforward, allowing for accurate comparisons without needing complex data adjustments, making it a powerful tool for understanding health interventions.

On TikTok, The Algorithm Optimizes YOU

Never Met a Science • 55 implied HN points • 31 May 23

🕹 Technology Data science

TikTok's algorithm shapes content creators' behavior based on feedback and viral success.
The algorithm aims to keep both creators and consumers engaged, but risks leading to repetitive content.
Data science and algorithms in platforms like TikTok create simplified simulations of reality for optimization, focusing on subjective metrics.

Monosemanticity at Home: My Attempt at Replicating Anthropic's Interpretability Research from Scratch

Jake Ward's Blog • 2 HN points • 30 Apr 24

🕹 Technology Data science

Large language models like ChatGPT have complex, learned logic that is difficult to interpret due to 'superposition' - where single neurons correspond to multiple functions.
Techniques like sparse dictionary learning can decompose artificial neurons into 'features' that exhibit 'monosemanticity', making the models more interpretable.
Reproducing research on model interpretability shows promise for breakthroughs and indicates a shift towards engineering challenges over scientific barriers.

January Newsletter

RSS DS+AI Section • 35 implied HN points • 02 Jan 24

🕹 Technology Data science

Continuing work on expanding accreditation for data science professionals
Hot topics include bias, ethics, and regulation in data science and AI
Exciting developments in research, generative AI, and real world applications

Hunyuan-Large, AI model for open-world games, X-Portrait 2 for realistic character animations, FLUX1.1 [pro] Ultra and Raw, Magentic-One, Hume AI App, action model for GUI agents and More

AI Brews • 15 implied HN points • 08 Nov 24

🕹 Technology Data science

Tencent has released Hunyuan-Large, a powerful AI model with lots of parameters that can outperform some existing models. It's good news for open-source projects in AI.
Decart and Etched introduced Oasis, a unique AI that can generate open-world games in real-time. It uses keyboard and mouse inputs instead of just text to create gameplay.
Microsoft's Magentic-One is a new system that helps solve complex tasks online. It's aimed at improving how we manage jobs across different domains.

AI Architecture #1: Unleashing the Scalability with SageMaker

Cloud Weekly • 52 implied HN points • 24 Jun 23

🕹 Technology Data science

ML systems are essential because they need to be dynamic, adaptive, and constantly monitored
SageMaker offers tools for both model training and model deployment
SageMaker provides various options for inference including real-time, serverless, async, and batch transform

Why Tech Giants Are Paying Millions for AI Training Data

Intuitive AI • 19 implied HN points • 22 Aug 24

🕹 Technology Data science

Tech companies are paying a lot for training data because it helps them improve their AI models. As AI use grows, high-quality data has become very valuable.
Having diverse and rich training data is crucial for AI to learn well. Just like a student needs various books to understand different subjects, AI needs various data to perform better.
Quality of the data matters even more than quantity. Rich, informative data leads to better AI outcomes, which is why companies are willing to spend big bucks on it.

What's happening at the intersection of ML and Engineering.

Arkid’s Newsletter • 17 HN points • 30 Sep 24

🕹 Technology Data science

AI and machine learning are creating a lot of hype, but it's important to separate the noise from the real value. Just like in the dot-com boom, there will be winners, but it won't be easy to find them.
Many companies are wasting money on consultants who offer little help without delivering real results. To succeed in AI, businesses need to focus on building intelligent products that can learn and iterate based on user feedback.
There's concern about AI taking over jobs in software and machine learning, but skilled professionals will still be needed. It’s crucial for entry-level workers to build solid expertise in their field and adapt to new developments in AI.

Self-Adapting LMs, Deep Research Agents, and Music-Aware Benchmarks

HackerPulse Dispatch • 5 implied HN points • 20 Jun 25

🕹 Technology Data science

Language models can now learn on their own by creating their own training data, which means they get better without needing human help.
There are new benchmarks to measure how well models understand music, making it easier to compare their performance on different tasks.
A new method allows for better code translation between different programming languages, outpacing older systems in speed and accuracy.

You are not a method

Counting Stuff • 54 implied HN points • 02 May 23

🕹 Technology Data science

Teams are often created to fill niche use cases, leading to specialized roles and organizational politics.
Being type-cast into a specific role can limit opportunities for growth and variety in work tasks.
To break out of being type-cast, showcase your ability to do different kinds of work and actively seek out diverse opportunities.