The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
Democratizing Automation 332 implied HN points 27 May 25
  1. Claude 4 is a strong AI model from Anthropic, focused on coding and software tasks. It has a unique personality and improved performance over its predecessors.
  2. The benchmarks for Claude 4 might not look impressive compared to others like ChatGPT and Gemini, which could affect its market position. It's crucial for Anthropic to show real-world utility beyond just numbers.
  3. Anthropic aims to lead in software development, but they fall behind in general benchmarks. This may limit their ability to compete with bigger players like OpenAI and Google in the race for advanced AI.
Democratizing Automation 554 implied HN points 18 Feb 25
  1. Grok 3 is a new AI model that's designed to compete with existing top models. It aims to improve quickly, with updates happening daily.
  2. There's increasing competition in the AI field, which is pushing companies to release their models faster, leading to more powerful AI becoming available to users sooner.
  3. Current evaluations of AI models might not be very practical or useful for everyday life. It's important for companies to share more about their evaluation processes to help users understand AI advancements.
TheSequence 56 implied HN points 07 Dec 25
  1. AI model development is changing focus from just making models bigger to making them smarter and more specialized. It's now about using different tools for specific tasks instead of one model for everything.
  2. Google's Gemini 3 Deep Think is a significant release that uses a new way of thinking to solve problems. It focuses on careful reasoning rather than quick responses, leading to much better problem-solving skills.
  3. Amazon's Nova 2 and Mistral's Large 3 provide new options for businesses by focusing on efficiency and privacy. These models allow companies to create tailored solutions without relying on large, generic AI models.
Data Analysis Journal 235 implied HN points 07 Feb 24
  1. Data quality metrics are essential for measuring data governance and analytics success.
  2. There is no industry standard for defining poor-quality data; it varies based on context.
  3. Having specific KPIs for data quality is crucial to scale data governance initiatives and improve the state of data quality.
DYNOMIGHT INTERNET NEWSLETTER 796 implied HN points 21 Nov 24
  1. LLMs like `gpt-3.5-turbo-instruct` can play chess well, but most other models struggle. Using specific prompts can improve their performance.
  2. Providing legal moves to LLMs can actually confuse them. Instead, repeating the game before making a move helps them make better decisions.
  3. Fine-tuning and giving examples both improve chess performance for LLMs, but combining them may not always yield the best results.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 39 implied HN points 10 Jul 24
  1. Using Chain-Of-Thought prompting helps large language models think through problems step by step, which makes them more accurate in their answers.
  2. Smaller language models struggle with Chain-Of-Thought prompting and often get confused because they don't have enough knowledge and understanding like the bigger models.
  3. Google Research has a method to teach smaller models by learning from larger ones. This involves using the bigger models to create helpful examples that the smaller models can then learn from.
The Data Ecosystem 119 implied HN points 21 Apr 24
  1. Data can be really complicated, and it's easy to miss how everything connects. People often focus on their own area and forget about the bigger picture of the data ecosystem.
  2. Chief Data Officers (CDOs) are important but can only do so much to fix data issues. They deal with many challenges, including limited power, lack of experience, and politics within the organization.
  3. To improve in the data field, we need to recognize the gaps in our knowledge, prioritize what to focus on, and continuously educate ourselves in both our own areas and related data domains.
Democratizing Automation 182 implied HN points 11 Aug 25
  1. The open-weight AI ecosystem has become a competitive market with many quality releases over the past year. This means there's a lot more choice and better options available now.
  2. Open models are gaining popularity because they are trusted, low-cost, and often better than closed models. Many users are starting with them instead of going for expensive alternatives.
  3. While text-based models are commonly discussed, there are also many valuable multimodal and specialized models that show the strength of the open AI ecosystem. It's exciting to see growth in these areas too.
Abstraction 29 implied HN points 05 Jan 26
  1. A structured, reproducible forecasting pipeline models how strong human forecasters think so methods can be tested and refined systematically.
  2. Huge cost cuts made iteration affordable: per-question cost dropped from $0.109 to $0.004 (about 27×), enabling many more experiments across the tournament.
  3. The team accepts a likely short-term performance hit by using cheaper models and fewer tokens because the priority is learning which pipeline parts truly matter using the tournament as a feedback loop.
Data Analysis Journal 452 implied HN points 26 Jul 23
  1. The author reflects on three years of writing a newsletter about analytics, thanking supporters and subscribers.
  2. The author's newsletter aims to document their journey, bridge the gap between academics and industry, and encourage classic data analysis.
  3. The author shares insights on their writing strategy, the power of being small and independent, and future plans for the newsletter.
Faster, Please! 639 implied HN points 06 Jan 25
  1. In a few years, we might see AI agents start working alongside humans, which could really change how companies function.
  2. Tech leaders believe that powerful AI could lead to huge advances in science and medicine, speeding up progress significantly.
  3. While there is excitement about AI's potential, it's also important to manage the risks to make sure it benefits everyone.
Data Science Weekly Newsletter 339 implied HN points 01 Dec 23
  1. Data science is evolving quickly, and it's important to stay updated with new advances and tools. Courses and reading lists can help you catch up and enhance your skills.
  2. Using machine learning to solve real-world problems, like correctly attributing quotes, shows the practical applications of data science. Collaboration between universities and organizations can lead to innovative solutions.
  3. The job market for data scientists is challenging right now. Many applicants are competing for limited positions, so if you're looking for a job, patience is key.
SeattleDataGuy’s Newsletter 494 implied HN points 19 Feb 25
  1. Always focus on the real problem behind a request, not just what is being asked. This helps you deliver better solutions that actually meet the business needs.
  2. Using clear frameworks can help organize your thoughts and make complex investigations easier. A structured approach leads to clearer communication and better results.
  3. Keep your communication simple and focused on what matters to your stakeholders. This helps everyone stay on the same page and reduces confusion.
Data Science Weekly Newsletter 179 implied HN points 01 Mar 24
  1. The DSPy framework makes working with large language models easier by focusing on programming instead of complex prompting techniques. This helps reduce errors and improves usability.
  2. A new sequence model approach shows better performance than traditional Transformers, especially for long data sequences. It also works faster, making it a promising development in the field.
  3. Learning resources like online courses and free books on deep learning and causal ML can help deepen understanding of data science. They provide structured material that is great for both beginners and advanced learners.
VuTrinh. 59 implied HN points 11 Jun 24
  1. Meta has developed a serverless Jupyter Notebook platform that runs directly in web browsers, making data analysis more accessible.
  2. Airflow is being used to manage over 2000 DBT models, which helps teams create and maintain their own data models effectively.
  3. Building a data platform from scratch can be a valuable learning experience, revealing important lessons about data structure and management.
Faster, Please! 639 implied HN points 23 Dec 24
  1. OpenAI has released a new AI model called o3, which is designed to improve skills in math, science, and programming. This could help advance research in various scientific fields.
  2. The o3 model performs much better than the previous model, o1, and other AI systems on important tests. This shows significant progress in AI performance.
  3. There's a feeling of optimism about AGI technology as these advancements might bring us closer to achieving more intelligent and capable AI systems.
Mindful Modeler 419 implied HN points 19 Sep 23
  1. For imbalanced classification tasks, 'Do Nothing' should be the default approach, especially when dealing with calibration, strong classifiers, and class-based metrics.
  2. Addressing imbalanced data should be considered in scenarios where misclassification costs vary, metrics are impacted by imbalance, or weaker classifiers are used.
  3. Instead of using oversampling methods like SMOTE, adjusting data weighting, using cost-sensitive machine learning, and threshold tuning are more effective ways to handle class imbalance.
TheSequence 70 implied HN points 12 Nov 25
  1. Kimi K2 Thinking is a new AI model that thinks in a more advanced way than just giving one answer at a time. It can plan and act over longer periods while staying on track.
  2. This model is built on a powerful billion-parameter system designed to improve how it learns and uses data efficiently. It makes the most of its resources when solving problems.
  3. Kimi K2 also uses smart training methods, like reinforcement learning, to help it use tools better and think through problems in a layered way.
TheSequence 546 implied HN points 26 Jan 25
  1. DeepSeek-R1 is a new AI model that shows it can perform as well or better than big-name AI models but at a much lower cost. This means smaller companies can now compete in AI innovation without needing huge budgets.
  2. The way DeepSeek-R1 is trained is different from traditional methods. It uses a new approach called reinforcement learning, which helps the model learn smarter reasoning skills without needing a ton of supervised data.
  3. The open-source nature of DeepSeek-R1 means anyone can access and use the code for free. This encourages collaboration and allows more people to innovate in AI, making technology more accessible to everyone.
Democratizing Automation 277 implied HN points 29 May 25
  1. There is a rise in Chinese AI models that use more open licenses, influencing other models to adopt similar practices. This pressure is especially affecting Western companies like Meta and Google.
  2. Qwen models are becoming more popular for fine-tuning compared to Llama models, with smaller American startups favoring Qwen. These trends show a shift in preferences in the AI community.
  3. The focus in AI is shifting from just model development to creating tools that leverage these models. This means future releases will often be tool-based rather than just about the AI models themselves.
Data Science Weekly Newsletter 339 implied HN points 17 Nov 23
  1. JAX is becoming popular for its speed and capabilities, and learning it may be essential for those familiar with PyTorch. It does have a steeper learning curve, but there are resources to help ease the transition.
  2. The demand for GPUs is skyrocketing, driven by various market factors. Understanding these dynamics can help anticipate the future of technology and resource availability in industries reliant on powerful computing.
  3. Freelancing in data science can lead to an overwhelming number of job offers. Tips on finding clients on platforms like Upwork and LinkedIn can help navigate this new freelance landscape.
Data Science Weekly Newsletter 379 implied HN points 27 Oct 23
  1. Web development is evolving with the use of local models and technologies for building applications, moving beyond just Python-based machine learning.
  2. It's becoming increasingly important for developers to understand GPUs since they're widely used in deep learning and can greatly enhance performance.
  3. Companies are exploring various use cases for generative AI that provide real value, focusing on practical implementations that drive return on investment.
Data Science Weekly Newsletter 219 implied HN points 26 Jan 24
  1. AI often gets criticized for the quality of its output, but that might not be the real issue people have with it. If quality is fixed, the conversation about AI could change significantly.
  2. Common sense is tricky to define and measure, but researchers are developing ways to quantify it both individually and collectively. This could help clarify how we understand common sense in different contexts.
  3. Large language models (LLMs) can transform education by encouraging hands-on learning. They offer opportunities for more interactive and engaging learning experiences.
Data Science Weekly Newsletter 299 implied HN points 08 Dec 23
  1. Data engineering is evolving with new design patterns that help improve efficiency in handling data. A new book dives into these patterns and their importance.
  2. Machine learning is being used to understand and control the movement of silicon atoms in materials, which could lead to advancements in technology like better electronics.
  3. A new model called PoseGPT can estimate 3D human poses from images and text, linking physical movements to broader concepts about humans, showing the capabilities of large language models.
Neurelo Engineering’s Substack 1 HN point 27 Sep 24
  1. Mock data is super useful for testing software, but it hasn't really improved much over the years. It needs to be more flexible and easier to generate high-quality data.
  2. Using LLMs (large language models) can be tricky for creating mock data. Instead of trying to generate everything, it’s often better to use techniques like topological sorting to keep relationships correct between data entries.
  3. A new approach is turning to strategies like the Genesis Point Strategy, which helps create unique mock data efficiently. It shows that you can simplify processes to get good results without overcomplicating things.
Enterprise AI Trends 168 implied HN points 06 Aug 25
  1. OpenAI has released two new open-weight models called gpt-oss-120b and gpt-oss-20b. This means people can run these powerful models on their own computers without needing an internet connection.
  2. The gpt-oss-120b model is very cost-effective and performs well, even better than some existing models, making advanced AI more accessible.
  3. It's been six years since OpenAI released an open weight model, so this move shows they are serious about reclaiming their position in the open-source AI community.
Democratizing Automation 182 implied HN points 22 Jul 25
  1. Chinese AI models are gaining attention in the market, especially with new releases and better collaborations happening all the time.
  2. The quality of the AI models available is improving quickly, with more reliable options for various tasks compared to earlier versions.
  3. Companies like Qwen are innovating and making strides in AI technology, which is reshaping the landscape of available tools and resources.
Briefly Bio 198 implied HN points 23 Feb 24
  1. Creating 96-well plate maps is important for organizing samples and tracking metadata during scientific experiments. This helps scientists during pipetting and later data analysis.
  2. Current methods for making plate maps, like using spreadsheets, can be clunky and error-prone as they often require managing multiple tables that are not linked.
  3. A new visual plate mapper allows for easy creation and editing of plate maps. It synchronizes the visual layout with a data table, making it simpler to manage and analyze experiment data.
Gradient Flow 519 implied HN points 06 Apr 23
  1. Developers can now create AI-powered applications without deep machine learning knowledge, opening up opportunities for rapid experimentation and innovation.
  2. Building custom large language models (LLMs) is becoming more accessible through startups offering resources for model fine-tuning or training from scratch.
  3. Integration of custom LLMs with third-party services, utilizing knowledge bases, and serving models efficiently are key areas of focus for developers in the AI application space.
Sector 6 | The Newsletter of AIM 99 implied HN points 18 Apr 24
  1. Meta has introduced MEGALODON, a new neural architecture that allows for infinite context length in AI, making it more efficient than previous models.
  2. With developments from Microsoft, Google, and Meta, the focus will shift away from which model has the highest context length, as all will likely have infinite capabilities soon.
  3. The upcoming Llama-3 model is expected to continue this trend by also supporting infinite context length, enhancing its utility in various applications.
The Algorithmic Bridge 647 implied HN points 11 Nov 24
  1. AI companies are hitting limits with current models. Simply making AI bigger isn't creating better results like it used to.
  2. The upcoming models, like Orion, may not meet the high expectations set by previous versions. Users want more dramatic improvements and are getting frustrated.
  3. A new approach in AI may focus on real-time thinking, allowing models to give better answers by taking a bit more time, though this could test users' patience.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 39 implied HN points 27 Jun 24
  1. Retrieval-Augmented Generation (RAG) mixes retrieval methods with learning systems to help large language models use real-time data.
  2. RAG can enhance the accuracy of language models by incorporating current information, avoiding wrong answers that might come from outdated knowledge.
  3. The framework of RAG includes steps like pre-retrieval, retrieval, post-retrieval, and generation, each contributing to better outputs in language processing tasks.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 39 implied HN points 26 Jun 24
  1. Phi-3 is a small language model that uses a special dataset called TinyStories. This dataset was designed to help the model create more varied and engaging stories.
  2. TinyStories uses simple vocabulary suitable for young children, focusing on quality over quantity. The stories generated are meant to be both understandable and entertaining.
  3. Training the Phi-3 model with TinyStories can be done quickly and allows for easier fine-tuning. This helps smaller organizations use advanced language models without needing huge resources.
Data Science Weekly Newsletter 359 implied HN points 21 Sep 23
  1. There's a new newsletter focusing on AI safety in China, showing that the country is more invested in AI safety than many think.
  2. A podcast discusses how startups can run better AI models without needing to upgrade their hardware—a big challenge in the field.
  3. An online event is coming up for those looking to secure data science jobs in big tech, focusing on interview strategies and market insights.
Technically 43 implied HN points 04 Dec 25
  1. Understanding how AI works is crucial to using it effectively. If you learn the basics, you can make AI a powerful tool instead of letting it take over your job.
  2. Many people use AI tools lazily and don’t take the time to understand how they work. This can lead to getting replaced if you’re not careful with your AI usage.
  3. There are resources available to help you learn about AI, and it's important to use them. The more you know, the better you can leverage AI in your work.
Human Capitalist 99 implied HN points 07 May 24
  1. There are a lot of unanswered questions about the workforce that data can help with. This could give businesses valuable insights into hiring trends and job market changes.
  2. A partnership with Seek.ai will allow people to ask real-time questions about workforce data. This means anyone can get important answers quickly, helping them make better decisions.
  3. The team is looking for creative questions to test their new analytics tool. People can submit their questions, and the most interesting ones will be selected for special insights.
Data at Depth 79 implied HN points 05 May 24
  1. Start with defining the function you want the audience to perform with the presented data before creating visualizations that support it
  2. Implement aspects like affordances, accessibility, and aesthetics to ensure your visualizations are clear, usable, and visually appealing for the audience
  3. Achieving acceptance of your data visualization involves following established design principles like direct labeling, thoughtful use of color, alignment, and the data-ink principle
Data Science Weekly Newsletter 139 implied HN points 07 Mar 24
  1. The newsletter shares valuable links about Data Science, AI, and Machine Learning each week. It's a great way to keep updated on the latest in the field.
  2. There are interesting articles highlighting statistical analyses and practical guides, like building GPU clusters at home. These resources help both beginners and experienced practitioners learn more.
  3. The newsletter also encourages people to participate in AI-related events and offers resources for job seekers. This can help you connect with others and grow your career.
Data Science Weekly Newsletter 339 implied HN points 19 Oct 23
  1. Data science, AI, and ML are rapidly evolving fields, with new technologies and techniques emerging frequently. Staying updated through news and articles can help professionals keep their skills relevant.
  2. Fine-tuning large language models (LLMs) is a growing demand in the job market. Many companies are now looking for experience with LLMs alongside traditional skills like Python and SQL.
  3. Understanding different data visualization goals, like storytelling versus exploration, is important for effectively communicating data insights. This can improve how data is presented in reports and analyses.