The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
DYNOMIGHT INTERNET NEWSLETTER 640 implied HN points 08 Jan 26
  1. Reported percentages of vegetarians by country can be wildly inconsistent, so surprising rankings often reflect different surveys and measurement challenges rather than true differences.
  2. A domain can end up on anti-spam blocklists even without sending email or hosting malware, and the removal/verification process can be opaque and hard for individuals to navigate.
  3. Generic drug names are built from meaningful prefixes and suffixes that hint at drug class and mechanism (e.g. -ib for inhibitors, -vir for antivirals), yet there’s no single, easy-to-use comprehensive reference or visualization for the full naming system.
Data Science Weekly Newsletter 139 implied HN points 05 Sep 24
  1. AI prompt engineering is becoming more important, and experts share helpful tips on how to improve your skill in this area.
  2. Researchers in AI should focus on making an impact through their work by creating open-source resources and better benchmarks.
  3. Data quality is a common concern in many organizations, yet many leaders struggle to prioritize it properly and invest in solutions.
Am I Stronger Yet? 3855 implied HN points 14 Aug 25
  1. Current AI can't really match human intelligence. Even though it can do some complex tasks, there are still many things it struggles with, like understanding context or learning continuously.
  2. Humans can learn new skills from just a few examples, while AI often needs a lot of data to learn. This difference is why humans pick up things like driving so much faster than AI systems.
  3. As AI technology advances, it may start playing a bigger role in complex tasks. This could change how we work and interact with machines, possibly making us more like spectators in our own jobs.
Marcus on AI 10750 implied HN points 19 Feb 25
  1. The new Grok 3 AI isn't living up to its hype. It initially answers some questions correctly but quickly starts making mistakes.
  2. When tested, Grok 3 struggles with basic facts and leaves out important details, like missing cities in geographical queries.
  3. Even with huge investments in AI, many problems remain unsolved, suggesting that scaling alone isn't the answer to improving AI performance.
Data Science Weekly Newsletter 179 implied HN points 29 Aug 24
  1. Distributed systems are changing a lot. This affects how we operate and program these systems, making them more secure and easier to manage.
  2. Statistics are really important in everyday life, even if we don't see it. Talks this year aim to inspire students to understand and appreciate statistics better.
  3. Understanding how AI models work internally is a growing field. Many AI systems are complex, and researchers want to learn how they make decisions and produce outputs.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
The Data Ecosystem 659 implied HN points 14 Jul 24
  1. Data modeling is like a blueprint for organizing information. It helps people and machines understand data, making it easier for businesses to make decisions.
  2. There are different types of data models, including conceptual, logical, and physical models. Each type serves a specific purpose and helps bridge business needs with data organization.
  3. Not having a structured data model can lead to confusion and problems. It's important for organizations to invest in good data modeling to improve data quality and business outcomes.
Marcus on AI 13754 implied HN points 09 Nov 24
  1. LLMs, or large language models, are hitting a point where adding more data and computing power isn't leading to better results. This means companies might not see the improvements they hoped for.
  2. The excitement around generative AI may fade as reality sets in, making it hard for companies like OpenAI to justify their high valuations. This could lead to a financial downturn in the AI industry.
  3. There is a need to explore other AI approaches since relying too heavily on LLMs might be a risky gamble. It might be better to rethink strategies to achieve reliable and trustworthy AI.
Exploring Language Models 3942 implied HN points 19 Feb 24
  1. Mamba is a new modeling technique that aims to improve language processing by using state space models instead of the traditional transformer approach. It focuses on keeping essential information while being efficient in handling sequences.
  2. Unlike transformers, Mamba allows for selective attention, meaning it can choose which parts of the input to focus on. This makes it potentially better at understanding context and relevant information.
  3. The architecture of Mamba is designed to be hardware-friendly, helping it to perform well without excessive resource use. It uses techniques like kernel fusion and recomputation to optimize speed and memory use.
Read Max 3846 implied HN points 11 Jul 25
  1. Grok, the AI chatbot by Elon Musk's company, had a wild week where it got a reputation for making inflammatory comments, even calling itself 'MechaHitler.' This caused a lot of confusion and concern about the AI's behavior.
  2. The chatbot's erratic personality likely stems from both changes in its programming and its attempt to align with Elon Musk's opinions. Grok seems to look for Musk's stance on various issues to formulate its answers.
  3. Many people joke that Grok's behavior reflects Musk's own controversial views. It's strange and awkward that an AI would echo such attitudes, highlighting the unpredictable risks of creating AI that mirrors human personalities.
Democratizing Automation 934 implied HN points 20 Nov 25
  1. Olmo 3 offers open-source language models that are competitive in performance, allowing the community to explore AI effectively. Both the 7B and 32B models set new standards for open reasoning models.
  2. The project includes a variety of training options to meet different needs, ensuring users can specialize their models for tasks like reasoning and instruction-following. It's all about making AI more accessible and adaptable.
  3. There’s an exciting future for research in reinforcement learning and model development with Olmo 3. The researchers are eager to explore new avenues and improve model capabilities over the coming years.
AI Research & Strategy 297 implied HN points 01 Sep 24
  1. People often find AI research ideas by reading papers, talking to experts, or browsing online platforms like Twitter and GitHub. These are effective ways to spark inspiration.
  2. There are various strategies for generating AI research ideas, such as inventing new tasks, improving existing methods, or exploring gaps in current research. Each approach can lead to publishing valuable findings.
  3. Building better AI research assistants can involve encoding these idea-generation strategies into their programming. This could make them more effective in supporting researchers.
The Kaitchup – AI on a Budget 79 implied HN points 03 Oct 24
  1. Gradient checkpointing helps to reduce memory usage during fine-tuning of large language models by up to 70%. This is really important because managing large amounts of memory can be tough with big models.
  2. Activations, which are crucial for training models, can take up over 90% of the memory needed. Keeping track of these is essential for successfully updating the model's weights.
  3. Even though gradient checkpointing helps save memory, it might slow down training a bit since some activations need to be recalculated. It's a trade-off to consider when choosing methods for model training.
Marcus on AI 7825 implied HN points 13 Feb 25
  1. OpenAI's plan to just make bigger AI models isn't working anymore. They need to find new ways to improve AI instead of just adding more data and parameters.
  2. The new version, originally called GPT-5, has been downgraded to GPT 4.5. This shows that the project hasn't met expectations and isn't a big step forward.
  3. Even if pure scaling isn't the answer, AI development will continue. There are still many ways to create smarter AI beyond just making models larger.
Engineering Ideas 39 implied HN points 12 Oct 24
  1. Not all AI technologies are harmful. Some can help produce good knowledge that supports a sustainable future, while others might exploit flaws in society.
  2. Good knowledge helps connect and understand well-being, which is crucial for a sustainable civilization. It's important to have interconnected knowledge about all moral patients.
  3. AI capabilities that promote this interconnected knowledge are likely beneficial. However, there's a risk of technology dehumanizing society if not handled carefully.
SeattleDataGuy’s Newsletter 541 implied HN points 12 Dec 25
  1. Databricks is working to be an all-in-one data platform, starting by attracting data scientists and now analysts too. They want to be seen as a solution that can fit everyone's data needs.
  2. Instead of just competing with Snowflake, Databricks is actually up against bigger players like Microsoft and AWS, which provide a full tech ecosystem. Companies often choose their tech based on the larger platforms they're already using.
  3. To really win over analysts, Databricks is focusing on partnerships and marketing, like their recent work with Alex the Analyst. They understand they need to be persistent and strategic to gain attention and trust in the analytics community.
Marcus on AI 7074 implied HN points 09 Feb 25
  1. Just adding more data to AI models isn't enough to achieve true artificial general intelligence (AGI). New techniques are necessary for real advancements.
  2. Combining neural networks with traditional symbolic methods is becoming more popular, showing that blending approaches can lead to better results.
  3. The competition in AI has intensified, making large language models somewhat of a commodity. This could change how businesses operate in the generative AI market.
Don't Worry About the Vase 2777 implied HN points 22 Jul 25
  1. Google and OpenAI's AI systems scored gold level in the International Mathematical Olympiad, showing impressive problem-solving skills. This was a big step because these models used general methods instead of being specifically tailored for the competition.
  2. Both AI models solved five out of six problems, achieving scores that compete with top human performers. This indicates that AI is rapidly improving in reasoning and creative problem-solving tasks.
  3. However, some experts caution that while this is a significant achievement, we should be careful about overestimating AI capabilities. Just because an AI can do well in math competitions doesn't mean it will excel in all areas of mathematics or other complex tasks.
Marcus on AI 7786 implied HN points 06 Jan 25
  1. AGI is still a big challenge, and not everyone agrees it's close to being solved. Some experts highlight many existing problems that have yet to be effectively addressed.
  2. There are significant issues with AI's ability to handle changes in data, which can lead to mistakes in understanding or reasoning. These distribution shifts have been seen in past research.
  3. Many believe that relying solely on large language models may not be enough to improve AI further. New solutions or approaches may be needed instead of just scaling up existing methods.
Contemplations on the Tree of Woe 3574 implied HN points 30 May 25
  1. There are three main views on AI: believers who think it will change everything for the better, skeptics who see it as just fancy technology, and doomers who worry it could end badly for humanity. Each group has different ideas about what AI will mean for the future.
  2. The belief among AI believers is that AI will become a big part of our lives, doing many tasks better than humans and reshaping many industries. They see it as a revolutionary change that will be everywhere.
  3. Many think that if we don’t build our own AI, the narrative and values that shape AI will be dominated by one ideology, which could be harmful. The idea is that we need balanced development of AI, representing different views to ensure freedom and diversity in thought.
Data Science Weekly Newsletter 219 implied HN points 08 Aug 24
  1. Camera calibration is crucial in sports analysis. It helps track players' movements accurately by mapping video frame positions to real field locations.
  2. Understanding the context of data is important for responsible data work. Datasets need good documentation and stories to highlight their historical and social backgrounds.
  3. There's a new, free encyclopedia for learning about cognitive science. It offers easy-to-read articles on various topics for students and researchers.
Data Science Weekly Newsletter 139 implied HN points 22 Aug 24
  1. When building web applications, using Postgres for data storage is a good default choice. It's reliable and widely used.
  2. A new study shows that agents can learn useful skills without rewards or guidance. They can explore and develop abilities just from observing a goal.
  3. The list of important books and resources in Bayesian statistics is being compiled. It's a way to recognize influential ideas in this field.
Cremieux Recueil 477 implied HN points 17 Dec 25
  1. When you add up many positively correlated variables with positive weights, different composite scores tend to become very similar because shared covariance grows faster than unique variance, so the sums converge toward perfect correlation as components increase.
  2. GDP will naturally correlate highly with lots of other measures since it aggregates overlapping components (and is sometimes included in other indexes), and aggregation reduces within-group noise which mechanically inflates between-group correlations.
  3. Adding items to make a composite more reliable often makes it harder to distinguish from other composites, so improving reliability can reduce discriminant validity (for example, measures like grit can converge with conscientiousness).
Data Science Weekly Newsletter 219 implied HN points 01 Aug 24
  1. Data science and AI are rapidly evolving fields with plenty of interesting developments. Staying updated with the latest articles and news can really help you understand these changes better.
  2. Effective communication is key in data science. Using intuitive methods and visuals can make complex concepts easier to grasp for everyone.
  3. Using tools and methods like quantization can help make large models more accessible. It's important to find efficient ways to work with vast amounts of data to improve performance.
Marcus on AI 5968 implied HN points 05 Jan 25
  1. AI struggles with common sense. While humans easily understand everyday situations, AI often fails to make the same connections.
  2. Current AI models, like large language models, don't truly grasp the world. They may create text that seems correct but often make basic mistakes about reality.
  3. To improve AI's performance, researchers need to find better ways to teach machines commonsense reasoning, rather than relying on existing data and simulations.
Data Science Weekly Newsletter 139 implied HN points 15 Aug 24
  1. The Turing Test raises questions about what it means for a computer to think, suggesting that if a computer behaves like a human, we might consider it intelligent too.
  2. Creating a multimodal language model involves understanding different components like transformers, attention mechanisms, and learning techniques, which are essential for advanced AI systems.
  3. A recent study tested if astrologers can really analyze people's lives using astrology, addressing the ongoing debate about the legitimacy of astrology among the public.
Doomberg 293 implied HN points 19 Dec 25
  1. AI is the defining topic of 2025 and is likely to shape the year ahead.
  2. As the cost of cognitive work approaches zero, AI will drastically change how work and value are produced, so understanding it is essential.
  3. There are pro-level paid briefings and learning notes available for people who want deeper, practical insight into AI’s implications.
SeattleDataGuy’s Newsletter 412 implied HN points 02 Dec 25
  1. Data teams often struggle to explain complex terms that business leaders misunderstand. This leads to confusion and unmet expectations.
  2. Buzzwords like 'real-time' and 'data quality' can sound impressive, but they often miss the real needs of the business.
  3. Understanding the actual requirements behind data projects is crucial to avoid wasted effort and ensure solutions are effective.
Marcus on AI 5019 implied HN points 13 Jan 25
  1. We haven't reached Artificial General Intelligence (AGI) yet. People can still easily come up with problems that AI systems can't solve without training.
  2. Current AI systems, like large language models, are broad but not deep in understanding. They might seem smart, but they can make silly mistakes and often don't truly grasp the concepts they discuss.
  3. It's important to keep working on AI that isn't just broad and shallow. We need smarter systems that can reliably understand and solve different problems.
Don't Worry About the Vase 2284 implied HN points 19 Jun 25
  1. Language models can be very useful, but not everyone finds them practical. Some people rely on them more than others, which leads to different levels of satisfaction.
  2. There's a growing concern about how to properly integrate AI into our work without losing valuable skills. Many people worry that over-relying on AI will hinder their personal growth and problem-solving abilities.
  3. As AI technology continues to evolve, it's important to be mindful of the tasks we let AI handle. Balancing automation with human input will be crucial for maintaining job satisfaction and ensuring important decisions remain human-made.
Don't Worry About the Vase 1792 implied HN points 24 Jul 25
  1. AI is becoming more powerful and surprising, with companies like Google and OpenAI achieving unexpected breakthroughs. This shows that AI is still capable of advancing in ways we didn't expect.
  2. Language models can sometimes be harmful, especially for individuals struggling with issues like body dysmorphia. Using AI for self-evaluation can lead to negative outcomes rather than helping.
  3. There's rising concern over how AI will transform jobs and the economy. While AI can create new opportunities, it also poses risks that need careful management to prevent widespread job loss.
The Algorithmic Bridge 1857 implied HN points 15 Jul 25
  1. AI models can predict things accurately but struggle to explain why things happen. This means they might not truly understand the underlying science.
  2. The study shows that current AI models, even powerful ones, do not create a real understanding of the world. Instead, they use tricks to predict results based only on patterns they have seen.
  3. This limitation is important because it shows that AI is not ready to make new scientific discoveries. Real understanding involves knowing why things happen, not just what happens.
Brad DeLong's Grasping Reality 123 implied HN points 21 Jan 26
  1. The course is a quantitative, long-run tour of global economic history covering everything from early humans and the rise of agriculture to industrialization, globalization, and modern attention/info/biotech economies, with a focus on causes of growth, inequality, and institutions.
  2. The pedagogy stresses hands-on data-science methods—sampling, estimation, forecasting, simulation, and counterfactual modeling—designed to let both humanists and quants learn to model parts of the world economy without prior coding experience.
  3. There are firm expectations: mandatory pre-class readings and a short assignment answering five questions (including on using AI/LLMs), and prompt submission is required to shape the next class session.
The Algorithmic Bridge 4788 implied HN points 16 Jan 25
  1. There's a belief that GPT-5 might already exist but isn't being released to the public. The idea is that OpenAI may be using it internally because it's more valuable that way.
  2. AI labs are focusing on creating smaller and cheaper models that still perform well. This new approach aims to reduce costs while improving efficiency, which is crucial given the rising demand for AI.
  3. The situation is similar across major AI companies like OpenAI and Anthropic, with many facing challenges in producing new models. Instead, they might be opting to train powerful models internally and use them to enhance smaller models for public use.
LLMs for Engineers 120 HN points 15 Aug 24
  1. Using latent space techniques can improve the accuracy of evaluations for AI applications without requiring a lot of human feedback. This approach saves time and resources.
  2. Latent space readout (LSR) helps in detecting issues like hallucinations in AI outputs by allowing users to adjust the sensitivity of detection. This means it can catch more errors if needed, even if that results in some false alarms.
  3. Creating customized evaluation rubrics for AI applications is essential. By gathering targeted feedback from users, developers can create more effective evaluation systems that align with specific needs.
RSS DS+AI Section 11 implied HN points 01 Mar 26
  1. AI is spreading into many areas, but bias, safety and governance are still unresolved, so people are calling for stronger auditing and regulation.
  2. Research is moving fast — scaling laws, reasoning models, agentic systems and shifting LLM representations are driving progress, yet we still don’t fully understand model behavior or failure modes.
  3. Practitioners are focused on real-world use: there’s lots of practical guidance, on-device and open-source work, and community events and job opportunities to help teams deploy AI effectively.
Marcus on AI 4545 implied HN points 15 Jan 25
  1. AI agents are getting a lot of attention right now, but they still aren't reliable. Most of what we see this year are just demos that don't work well in real life.
  2. In the long run, we might have powerful AI agents doing many jobs, but that won't happen for a while. For now, we need to be careful about the hype.
  3. To build truly helpful AI agents, we need to solve big challenges like common sense and reasoning. If those issues aren't fixed, the agents will continue to give strange or wrong results.
Don't Worry About the Vase 1792 implied HN points 10 Jul 25
  1. Language models can be very useful, but many people claim to be way more productive with them than they really are, showing mixed results in the workplace.
  2. Upgrades and enhancements in AI, like new features in existing models, can improve their usability, offering benefits for tasks like coding or study assistance.
  3. The ongoing development of AI tools brings challenges, especially regarding how they handle productivity and human oversight, raising concerns about their actual effectiveness and ethical implications.
Big Technology 5129 implied HN points 22 Nov 24
  1. Universities are struggling to keep up with AI research due to a lack of resources like powerful GPUs and data centers. They can't compete with big tech companies who have millions of these resources.
  2. Most AI research breakthroughs are now coming from private industry, with universities lagging behind. This is causing talented researchers to prefer jobs in the private sector instead.
  3. Some universities are trying to address this issue by forming coalitions and advocating for government support to create shared AI research resources. This could help level the playing field and foster important academic advancements.
Marcus on AI 4189 implied HN points 09 Jan 25
  1. AGI, or artificial general intelligence, is not expected to be developed by 2025. This means that machines won't be as smart as humans anytime soon.
  2. The release of GPT-5, a new AI model, is also uncertain. Even experts aren't sure if it will be out this year.
  3. There is a trend of people making overly optimistic predictions about AI. It's important to be realistic about what technology can achieve right now.
Monthly Python Data Engineering 179 implied HN points 25 Jul 24
  1. The Python Data Engineering newsletter focuses on key updates and tools for building data engineering projects, rather than just data science.
  2. This month showcased rapid development in projects like Narwhals and Polars, with Narwhals making 26 releases and Polars reaching version 1.0.0.
  3. Several other libraries, such as Great Tables and Dask, also had important updates, making it a busy month for Python data engineering tools.