The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
Data Analysis Journal 353 implied HN points 22 Mar 23
  1. Analytics engineers bridge the gap between data engineers and data analysts by focusing on producing high-quality data.
  2. Analytics engineers use tools like dbt to streamline data modeling, testing, and documentation.
  3. Data quality is crucial in decision-making, making analytics engineering more important than ever.
Gonzo ML 504 implied HN points 02 Jan 25
  1. In 2024, AI is focusing on test-time compute, which is helping models perform better by using new techniques. This is changing how AI works and interacts with data.
  2. State Space Models are becoming more common in AI, showing improvements in processing complex tasks. People are excited about new tools like Bamba and Falcon3-Mamba that use these models.
  3. There's a growing competition among different AI models now, with many companies like OpenAI, Anthropic, and Google joining in. This means more choices for users and developers.
Data Science Weekly Newsletter 399 implied HN points 25 Aug 23
  1. Each week, a newsletter shares important links and articles about data science, machine learning, and AI. It's a good way to keep updated on new happenings in the field.
  2. The newsletter features articles on various topics, including programming, AI forecasting, and data management practices. These articles are meant to help both newcomers and experienced professionals.
  3. Job listings and training resources are also provided, helping readers find opportunities and learn new skills beneficial for their careers in data science.
The Algorithmic Bridge 573 implied HN points 22 Nov 24
  1. OpenAI has spent a lot of money trying to fix an issue with counting the letter R in the word 'strawberry.' This problem has caused a lot of confusion among users.
  2. The CEO of OpenAI thinks the problem is silly but feels it's important to address because users are concerned. They are also looking into redesigning how their models handle letter counting.
  3. Some employees joked about extreme solutions like eliminating red fruits to avoid the R issue. They are also thinking of patches to improve letter counting, but it's clear they have more work to do.
Rod’s Blog 238 implied HN points 15 Dec 23
  1. Generative AI is a rapidly evolving field creating novel content like images, text, music, etc., with real-world applications from enhancing creativity to helping solve problems.
  2. To succeed in generative AI, you need skills like mathematics and statistics, programming, data science, knowledge of generative AI methods, and creativity in your specific domain.
  3. To learn generative AI in 2024, leverage online courses, books, blogs, tools, and engage in communities and events dedicated to this field.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Data Science Weekly Newsletter 339 implied HN points 29 Sep 23
  1. Data science involves a mix of techniques for analyzing and visualizing data which can help make informed decisions.
  2. Learning about advanced customer segmentation methods can enhance how businesses understand and target their customers.
  3. There are various roles in data-related careers beyond just being a data scientist, so it's good to explore different paths.
Data Science Weekly Newsletter 299 implied HN points 03 Nov 23
  1. Companies are increasingly sharing their advanced AI models openly, which can help them improve and build better products. This open sharing can lead to a more cooperative tech environment.
  2. Data science job applications are extremely competitive, with many positions receiving thousands of applicants within a day. This shows a high interest and demand in the data science field.
  3. Exploring advanced tools and frameworks in AI can be complex, but understanding how they work can help in building effective applications, especially in question-answering systems.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 99 implied HN points 08 Apr 24
  1. RAG implementations are changing to become more like agents, which means they can make better decisions and adapt to different situations.
  2. The structure of prompts is really important now; it’s not just about adding data, but about crafting the prompts to improve how they perform.
  3. Agentic RAG allows for complex tasks by using multiple tools together, making it capable of handling detailed questions that standard RAG cannot.
Democratizing Automation 562 implied HN points 14 Nov 24
  1. Scaling in AI is technically effective, but the improvements visible to users are slowing down.
  2. There is a need for more specialized AI models, as bigger models may not always be the solution for current limits.
  3. There's still a lot of potential for new AI products and capabilities, which could unlock significant value in the future.
Data Science Weekly Newsletter 259 implied HN points 23 Nov 23
  1. This newsletter shares weekly interesting links and updates in data science, AI, and machine learning. It's a great way to stay informed about new developments in these fields.
  2. There's a focus on practical tools and techniques for improving data science work, like using cloud processing for large datasets and methods for fine-tuning AI models effectively.
  3. The newsletter also highlights job opportunities and resources for those looking to enter or advance in the data science industry. It's beneficial for anyone looking to grow their career in this area.
Data Science Weekly Newsletter 379 implied HN points 18 Aug 23
  1. Writing clear and effective research papers is essential, and there are tips specifically for NLP papers that can help improve your writing skills.
  2. The job market for data-related roles has changed over the years, and analyzing hiring trends can provide insights into what skills and positions are in demand.
  3. Understanding AI hardware is important because it forms the backbone of many AI models, and knowing how it works can help in making better tech decisions.
The Future, Now and Then 162 implied HN points 16 Jul 25
  1. Generative AI is really about doing what's good enough for certain tasks. It's useful when perfection isn't needed, like for basic reports or planning a simple trip.
  2. The way generative AI is used often depends on the interests of investors, not users. Those making decisions may prioritize profit over quality, affecting how useful AI can be in fields like journalism and medicine.
  3. We need to be careful with how we talk about AI, as calling it 'intelligent' can lead to misunderstandings and conspiracy theories. This can have real-world consequences if people start believing silly claims.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 39 implied HN points 19 Jun 24
  1. Phi-3 is a small language model that can run directly on your phone, making it accessible for local use instead of needing cloud connections. This means you can use it anywhere without relying on internet speed.
  2. Small language models like Phi-3 are good for specific tasks and regulated industries where data privacy is important. They can provide quick and accurate responses while keeping your data secure.
  3. Training for Phi-3 involves using high-quality data to improve its understanding of language and reasoning skills, allowing it to perform well on par with larger models, despite its smaller size.
Data Science Weekly Newsletter 399 implied HN points 04 Aug 23
  1. Integrating large language models into systems can be done using seven key patterns that balance performance and cost.
  2. Ethics in AI isn't just about explainability and fairness; we need a deeper understanding to prevent overall harm from AI systems.
  3. New approaches in robotics focus on current challenges and opportunities while advancing understanding of AI's role in planning tasks.
Brad DeLong's Grasping Reality 176 implied HN points 29 Jun 25
  1. Understanding complexity and emergence is crucial for grasping advanced artificial intelligence concepts. It's not just about scaling up technology but comprehending how simple rules can create complex behaviors.
  2. Human intelligence is a result of both evolution and shared knowledge as a species. We are already a network of minds working together, which influences how we create and interact with machines.
  3. The future of AI should focus on enhancing human capabilities rather than mimicking intelligence. We need to consider if we're creating true understanding or just sophisticated imitation.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 79 implied HN points 25 Apr 24
  1. Large Language Models (LLMs) are evolving with more functionality, combining various tasks into fewer models. This helps in making them more efficient for users.
  2. There are different zones in the LLM landscape, each focusing on specific uses, tools, and applications, ranging from available models to user interfaces.
  3. Tech advancements like prompt engineering and data-centric tools are making it easier to harness the power of LLMs, opening up new opportunities for businesses.
jonstokes.com 164 implied HN points 05 Jul 25
  1. LLMs have limits when it comes to reasoning. If a problem is too complex or involves too many moving parts, the model can struggle to find a solution.
  2. The size of a language model's 'latent state window' matters. This window limits how much information the model can hold while trying to reason, separating it from just the number of tokens it can handle.
  3. To get good results from LLMs, it's best to keep tasks simple and broken down into manageable pieces. If you give the model too much to juggle at once, it won't perform well.
Top of the Lyne 314 implied HN points 29 Apr 23
  1. Net Revenue Retention is a science, not art, and can be engineered
  2. Successful subscription businesses have at least 20% of revenue driven by expansion, with some as high as 40%
  3. Slack's segmentation engine is a complex but well-crafted marvel of data science and engineering
SeattleDataGuy’s Newsletter 400 implied HN points 17 Jan 25
  1. The data tools market is seeing a lot of consolidation lately, with companies merging or getting acquired. This means there are fewer companies competing, but it can lead to better tools overall.
  2. Acquisitions can be a mixed bag for customers. While some products improve after being bought, others might lose their features or support, making it risky for users.
  3. There's a push for bundled data solutions where customers want fewer, but more comprehensive tools. This could change how data companies operate and how startups survive in the future.
Data Science Weekly Newsletter 299 implied HN points 13 Oct 23
  1. The newsletter is deciding whether to publish twice a week, but will stick to one issue for now to review feedback from readers.
  2. There's a focus on providing useful resources for data science, including articles and job opportunities in the field.
  3. New tools and methods in AI and data engineering are highlighted, addressing challenges like data integration and AI model training.
Data Science Weekly Newsletter 319 implied HN points 07 Sep 23
  1. AI startups can receive significant support through programs like AI Grant, offering up to $250,000 for development.
  2. Recent studies have shown that large language models can learn from just one example, which challenges previous beliefs about their efficiency.
  3. Using advanced tools like the Semantic Layer and LLMs can greatly improve data accuracy and speed for businesses, making analytics much easier.
Data Science Weekly Newsletter 299 implied HN points 06 Oct 23
  1. There's a lot happening in data science right now. The team is considering adding a second newsletter each week to cover more exciting content.
  2. High-performing data scientists have specific traits that set them apart from others. Companies are researching these traits to help improve their teams.
  3. Art institutions can greatly benefit from data and analytics. Collaborating with leaders can help them use data to improve their operations and strategies.
SwirlAI Newsletter 294 implied HN points 18 Mar 23
  1. Learning to decompose a data system is crucial for better reasoning and understanding of large infrastructure
  2. Decomposing a data system allows for scalability, identification of bottlenecks, and total event processing latency optimization
  3. The different layers in a data system include data ingestion, transformation, and serving layers, each with specific functions and technologies
Confessions of a Code Addict 529 implied HN points 29 Oct 24
  1. Clustering algorithms can never be perfect and always require trade-offs. You can't have everything, so you have to choose what matters most for your project.
  2. There are three key properties that clustering should ideally have: scale-invariance, richness, and consistency, but no algorithm can achieve all three simultaneously.
  3. Understanding these sacrifices helps in making better decisions when using clustering methods. Knowing what to prioritize can lead to more effective data analysis.
Sunday Letters 59 implied HN points 12 May 24
  1. Modern AI systems have a random element, making them sometimes unpredictable or unreliable. This means they can give different answers even to the same question, which is a challenge for creating consistent outputs.
  2. Just like the early cloud systems, we need to use smart software solutions to make our current AI technologies more reliable. Instead of relying solely on the AI itself, we should layer software to handle and fix errors.
  3. To build better AI systems, it’s important to explore structured approaches, like guided conversations or iterative processes. This way, we can combine the strengths of AI with reliable system design.
Data Science Weekly Newsletter 299 implied HN points 14 Sep 23
  1. Nvidia has been a leader in AI technology, but its dominance might not last. Changes in the market and technology could shift the competitive landscape soon.
  2. For those who know R and want to learn Python, there are resources available to help make the transition easier. These resources provide advice and tips catered to R users.
  3. Reinforcement Learning with Human Feedback (RLHF) is an important part of training large language models. It's essential for improving how these models understand and respond to human preferences.
The Algorithmic Bridge 424 implied HN points 23 Dec 24
  1. OpenAI's new model, o3, has demonstrated impressive abilities in math, coding, and science, surpassing even specialists. This is a rare and significant leap in AI capability.
  2. There are many questions about the implications of o3, including its impact on jobs and AI accessibility. Understanding these questions is crucial for navigating the future of AI.
  3. The landscape of AI is shifting, with some competitors likely to catch up, while many will struggle. It's important to stay informed to see where things are headed.
The Future of Life 19 implied HN points 21 Jul 24
  1. AI improvement has slowed down in terms of new abilities since GPT-4 came out, but other factors like cost and speed have gotten much better.
  2. The focus now is on practical changes and making AI more valuable, which will help set the stage for bigger breakthroughs in the future.
  3. Reaching human-level skills in tests doesn't mean AI will be truly intelligent. Future development will need to incorporate more complex abilities like planning and learning from experiences.
Democratizing Automation 427 implied HN points 11 Dec 24
  1. Reinforcement Finetuning (RFT) allows developers to fine-tune AI models using their own data, improving performance with just a few training samples. This can help the models learn to give correct answers more effectively.
  2. RFT aims to solve the stability issues that have limited the use of reinforcement learning in AI. With a reliable API, users can now train models without the fear of them crashing or behaving unpredictively.
  3. This new method could change how AI models are trained, making it easier for anyone to use reinforcement learning techniques, not just experts. This means more engineers will need to become familiar with these concepts in their work.
Data Science Weekly Newsletter 239 implied HN points 10 Nov 23
  1. Data scientists share interesting links and news weekly about AI, machine learning, and data visualization. It's a great way to stay updated on trends and tools in the field.
  2. Learning about the basics of deep learning and mathematical foundations is important for anyone starting in machine learning. Understanding key concepts helps you tackle complex problems more effectively.
  3. There are many job opportunities in data science and related fields. Keeping an eye on openings can lead to exciting career advancements and collaborations.
Democratizing Automation 435 implied HN points 04 Dec 24
  1. OpenAI's o1 models may not actually use traditional search methods as people think. Instead, they might rely more on reinforcement learning, which is a different way of optimizing their performance.
  2. The success of OpenAI's models seems to come from using clear, measurable outcomes for training. This includes learning from mistakes and refining their approach based on feedback.
  3. OpenAI's approach focuses on scaling up the computation and training process without needing complex external search strategies. This can lead to better results by simply using the model's internal methods effectively.
Top Carbon Chauvinist 19 implied HN points 20 Jul 24
  1. Machines don't really learn like humans do. They can take in data and improve performance, but they don't understand or experience learning in the same way we do.
  2. The term 'machine learning' can be misleading. It's more about machines mimicking learning processes rather than actually experiencing them.
  3. Understanding how machines operate helps clarify their limitations. They can process large amounts of information but lack conscious experience or true comprehension.
Gonzo ML 126 implied HN points 28 Jul 25
  1. The recent ICML 2025 Outstanding Papers show a huge amount of important research in machine learning, but many people feel overwhelmed and can't read everything in-depth.
  2. It's okay to admit that you can't keep up with all the new papers. Using AI tools can help manage the load and ensure you're still getting the important insights you need.
  3. Some of the papers focus on practical issues, like improving predictions and making AI more collaborative, which are vital for real-world applications.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 19 implied HN points 18 Jul 24
  1. GPT-4o mini is a new language model that's cheaper and faster than older models. It handles text and images and is great for tasks requiring quick responses.
  2. Small Language Models (SLMs) like GPT-4o mini can run efficiently on devices without relying on the cloud. This helps with costs, privacy, and gives users more control over the technology.
  3. SLMs are designed to be flexible and customizable. They can learn from various types of inputs and can adapt more easily to specific needs.
ChinaTalk 400 implied HN points 16 Dec 24
  1. China aims to become a top producer of humanoid robots by 2027, planning to use them in various industries like manufacturing and services. This is partly because they face labor shortages and believe humanoids can do many tough jobs.
  2. Humanoid robots need advanced technology in hardware and AI to work well. This includes making them mimic human movements and learning from real-world experiences, which is still a big challenge.
  3. The automotive industry could be key for testing and improving humanoid robots. Car factories have structured environments that help robots learn new tasks safely while addressing labor shortages in that sector.
The Data Ecosystem 59 implied HN points 05 May 24
  1. Data is generated and used everywhere now, thanks to smart devices and cheaper storage. This means businesses can use data for many purposes, but not all those uses are helpful.
  2. Processing data has become much easier over the years. Small companies can now use tools to analyze data without needing a team of experts, although some guidance is still necessary.
  3. Analytics has shifted from just looking at past data to predicting future trends. This helps companies make better decisions, and AI is starting to take over some of these tasks.