The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
Don't Worry About the Vase • 3852 implied HN points • 30 Dec 24
  1. OpenAI's new model, o3, shows amazing improvements in reasoning and programming skills. It's so good that it ranks among the top competitive programmers in the world.
  2. o3 scored impressively on challenging math and coding tests, outperforming previous models significantly. This suggests we might be witnessing a breakthrough in AI capabilities.
  3. Despite these advances, o3 isn't classified as AGI yet. While it excels in certain areas, there are still tasks where it struggles, keeping it short of true general intelligence.
Data Science Weekly Newsletter • 159 implied HN points • 25 Jul 24
  1. AI models can break down when trained on data that is generated by other models. This can cause problems in how well they work.
  2. There is scientific research about the history of Italian filled pasta. It shows that most types likely came from a single area in northern Italy.
  3. There are new resources and guides available for improving predictive modeling with tabular data. These can help you build better models by focusing on how data is represented.
Encyclopedia Autonomica • 19 implied HN points • 09 Oct 24
  1. Using Transformer Agents 2.0 is a step up from traditional methods. They can handle multi-step tasks better and have memory to store information as they work.
  2. Setting up and building a basic ReAct Agent is straightforward. You only need to install some packages and create the agent using selected models and tools.
  3. You can orchestrate multiple agents together for more complex tasks. By combining different agents, you can enhance their capabilities and improve the results of your searches or queries.
Data Science Weekly Newsletter • 1418 implied HN points • 19 Jan 24
  1. Good data visualization is important. Some types of graphs can be misleading, and it's better to avoid them.
  2. In healthcare, it's not just about having advanced technology like AI. The real focus should be on getting effective results from these technologies.
  3. Netflix released a lot of data about what people watched in 2023. Analyzing this can help us understand trends in streaming better.
Graphs For Science • 105 implied HN points • 10 Jan 26
  1. A strong theme is practical engineering: many books show how to turn LLM demos into working agents using RAG, embeddings, knowledge graphs, tool use, and prompt patterns to make outputs more reliable and auditable.
  2. There’s a clear focus on hands-on playbooks and trade-offs—quick-starts, checklists, code examples, and patterns for prototyping, retrieval, latency/cost decisions, multi-agent orchestration, and production concerns.
  3. The collection balances technical how-to guidance with broader perspectives on responsible use, human uniqueness, organizational strategy, and interdisciplinary science, highlighting ethics, norms for academics, and big-picture questions about life and intelligence.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
The Palindrome • 4 implied HN points • 14 Mar 26
  1. Machine learning means training predictive models from data. The core setup uses a dataset, a parametric model (a hypothesis), and a loss function to measure how well the model fits the data.
  2. A model approximates the true input–output relation and depends on both its parameters and the training data (often written h(x; w, D)). Models can be deterministic or probabilistic and belong to different families like generative or discriminative.
  3. Which learning paradigm you use depends on what inputs, outputs, and labels are available — the main paradigms are supervised, unsupervised, semi‑supervised, and reinforcement learning. In supervised learning you have input–label pairs and the goal is to learn the mapping from x to y.
Encyclopedia Autonomica • 19 implied HN points • 06 Oct 24
  1. Synthetic data is crucial for AI development. It helps create large amounts of high-quality data without privacy concerns or high costs.
  2. There are various projects focused on generating synthetic data. Tools like AgentInstruct and DataDreamer aim to create diverse datasets for training language models.
  3. Learning methods for synthetic data include using personas to create unique datasets and improving mathematical reasoning skills through specially designed datasets.
The Counterfactual • 99 implied HN points • 02 Aug 24
  1. Language models are trained on specific types of language, known as varieties. This includes different dialects, registers, and periods of language use.
  2. Using a representative training data set is crucial for language models. If the training data isn't diverse, the model can perform poorly for certain groups or languages.
  3. It's important for researchers to clearly specify which language and variety their models are based on. This helps everyone better understand what the model can do and where it might struggle.
Don't Worry About the Vase • 1209 implied HN points • 18 Jun 25
  1. The new Gemini 2.5 Pro model from Google is better at coding and has improved reasoning skills, but users have mixed feelings about its personality changes.
  2. Some people think the updates focus too much on benchmarks, making the model feel less creative and more sycophantic in its responses.
  3. The price for its Flash Lite version is very affordable, making it a good option for many users, but concerns about how safe and reliable it is remain.
Don't Worry About the Vase • 2777 implied HN points • 31 Dec 24
  1. DeepSeek v3 is a powerful and cost-effective AI model with a good balance between performance and price. It can compete with top models but might not always outperform them.
  2. The model has a unique structure that allows it to run efficiently with fewer active parameters. However, this optimization can lead to challenges in performance across various tasks.
  3. Reports suggest that while DeepSeek v3 is impressive in some areas, it still falls short in aspects like instruction following and output diversity compared to competitors.
Trevor Klee’s Newsletter • 970 implied HN points • 10 Jul 25
  1. Virtual synthetic repurposing trials use existing healthcare data to see how already available drugs can help treat various diseases. This method can lead to important insights without needing traditional trials.
  2. Currently, these trials are done by small teams and can be slow and hard to replicate. There’s a call for a more organized approach that uses technology to speed up the process and improve access to data.
  3. By setting up teams focused on software, data cleaning, and navigating regulations, we could create a system that shares results openly. This would allow more researchers to explore and build on findings.
Don't Worry About the Vase • 2598 implied HN points • 26 Dec 24
  1. The new AI model, o3, is expected to improve performance significantly over previous models and is undergoing safety testing. We need to see real-world results to know how useful it truly is.
  2. DeepSeek v3, developed for a low cost, shows promise as an efficient AI model. Its performance could shift how AI models are built and deployed, depending on user feedback.
  3. Many users are realizing that using multiple AI tools together can produce better results, suggesting a trend of combining various technologies to meet different needs effectively.
Tech Talks Weekly • 39 implied HN points • 19 Sep 24
  1. Tech Talks Weekly recently reached 2000 subscribers, which shows a growing interest in tech discussions and events.
  2. This issue features talks from 17 different conferences, emphasizing the variety of topics available in tech.
  3. There are special issues highlighting all JavaScript and Java talks of 2024, catering to specific interests among tech enthusiasts.
Gradient Flow • 339 implied HN points • 16 May 24
  1. AI agents are evolving to be more autonomous than traditional co-pilots, capable of proactive decision-making based on goals and environment understanding.
  2. Enterprise applications of AI agents focus on efficient data collection, integration, and analysis to automate tasks, improve decision-making, and optimize business processes.
  3. The field of AI agents is advancing with new tools like CrewAI, highlighting the importance of MLOps for reliability, traceability, and ensuring ethical and safe deployment.
HackerNews blogs newsletter • 19 implied HN points • 03 Oct 24
  1. Building a personal ghostwriter can help with productivity and writing tasks. It's about creating a tool that assists you effectively.
  2. Refactoring code is important for improving software. It makes programs easier to understand and maintain, even for those who aren't programmers.
  3. AI and machine learning can benefit from powerful hardware setups. Training models on many GPUs can significantly speed up the process.
Tech Talks Weekly • 19 implied HN points • 03 Oct 24
  1. Tech Talks Weekly curates talks from various tech conferences so you can catch up on what you missed. It's a great way to stay updated on industry trends without the hassle of searching multiple platforms.
  2. The newsletter has grown significantly, indicating that many people find the content valuable. Engaging with the audience helps in tailoring future content to better meet their needs.
  3. The latest issue features a lot of new talks, making it a larger edition than usual. This includes recommendations to explore specific talks that have gained a lot of views from various conferences.
Data Science Weekly Newsletter • 999 implied HN points • 12 Jan 24
  1. Using ChatGPT can help you budget better. It can track and categorize your spending easily.
  2. When coding, it's important to find a balance between moving quickly and keeping your code well-structured. This is a real challenge for many developers.
  3. Language models, like GPT-4, are becoming very advanced, but there are big philosophical questions about what that really means for intelligence and understanding.
Democratizing Automation • 649 implied HN points • 15 Aug 25
  1. Continual learning isn't essential for AI progress; scaling existing systems is more important. AI will evolve and improve without mimicking human learning too closely.
  2. Current language models can't learn or adapt over time like humans do, but they can still handle context effectively and improve in their capacity to process information.
  3. Better context management and new AI models in the future will bridge the gap between current capabilities and continual learning, making AI systems more adaptable and efficient.
Don't Worry About the Vase • 2419 implied HN points • 02 Jan 25
  1. AI is becoming more common in everyday tasks, helping people manage their lives better. For example, using AI to analyze mood data can lead to better mental health tips.
  2. As AI technology advances, there are concerns about job displacement. Jobs in fields like science and engineering may change significantly as AI takes over routine tasks.
  3. The shift of AI companies from non-profit to for-profit models could change how AI is developed and used. It raises questions about safety, governance, and the mission of these organizations.
benn.substack • 1048 implied HN points • 06 Jun 25
  1. Data tools are getting more advanced, but many people still struggle with knowing how to use them effectively. This means that having the right tools isn't enough if users lack direction.
  2. The industry is shifting focus from traditional analytics towards building AI systems and infrastructure. Companies are now adapting their technologies to support AI applications instead of just analyzing data.
  3. Self-serve BI tools aren't being used as intended because people often don't know what questions to ask. Providing clearer direction and goals might help users make better use of available data.
ChinaTalk • 2075 implied HN points • 28 Jan 25
  1. DeepSeek is gaining attention in the AI community for its strong performance and efficient use of computing power. Many believe it showcases China’s growing capabilities in AI technology.
  2. The culture at DeepSeek focuses on innovation without immediate monetization, emphasizing the importance of young talent in AI advancements. This approach has differentiated them from larger tech firms.
  3. Despite initial success, there are still concerns about the long-term sustainability of AI business models. The demand for computing power is high, and no company has enough to meet the future needs.
In My Tribe • 212 implied HN points • 17 Nov 25
  1. Many people believe that AI could end up being more disliked than social media companies. There's a concern about AI causing harm as it becomes more advanced.
  2. AI models, like LLMs, tend to reinforce the ideas of users instead of challenging them. This can make users confident, but may not always provide the best advice.
  3. AI is becoming a major player in creating ads, often needing little human input. This could change the job market for those involved in video production, as AI can do the work faster and cheaper.
The AI Frontier • 99 implied HN points • 25 Jul 24
  1. In AI, there's no single fix that will solve all problems. Success comes from making lots of small improvements over time.
  2. Data quality is very important. If you don't start with good data, the results won't be good either.
  3. It's essential to measure changes carefully when building AI applications. Understanding what works and what doesn't can save you from costly mistakes.
Mindful Modeler • 199 implied HN points • 18 Jun 24
  1. The limitations of feature attribution methods like SHAP and Integrated Gradients have been studied, particularly focusing on their reliability for explaining predictions as a sum of attributions.
  2. Tasks such as algorithmic recourse, characterizing model behavior, and identifying spurious feature identification all revolve around how predictions change with slight feature alterations, making SHAP unsuitable for these specific tasks.
  3. It's important to avoid using SHAP for questions related to minor changes in feature values or counterfactual analysis, as it may yield unreliable results in such scenarios.
Don't Worry About the Vase • 2464 implied HN points • 12 Dec 24
  1. AI technology is rapidly improving, with many advancements happening from various companies like OpenAI and Google. There's a lot of stuff being developed that allows for more complex tasks to be handled efficiently.
  2. People are starting to think more seriously about the potential risks of advanced AI, including concerns related to AI being used in defense projects. This brings up questions about ethics and the responsibilities of those creating the technology.
  3. AI tools are being integrated into everyday tasks, making things easier for users. People are finding practical uses for AI in their lives, like getting help with writing letters or reading books, making AI more useful and accessible.
Democratizing Automation • 190 implied HN points • 23 Nov 25
  1. Many labs in the U.S. are creating high-quality open models, similar in number to those in China, but U.S. models tend to be smaller and have stricter licenses.
  2. Leading U.S. companies like Nvidia, Ai2, Google, and Stanford are at the forefront of releasing these models, showing strong potential for future growth.
  3. There's been a recent uptick in truly open models from various labs, suggesting a shift toward more accessible AI resources for developers.
Data Science Weekly Newsletter • 959 implied HN points • 29 Dec 23
  1. This week, there's a focus on using data science techniques for practical decision-making, highlighted by an interview with Steven Levitt, who discusses making tough choices using data.
  2. There's a roundup of AI developments from 2023, showing how the field has evolved over the past year, which can help professionals stay updated.
  3. Understanding data quality is essential, as it directly impacts how useful data is for decision-making and analysis in any organization.
Who is Robert Malone • 12 implied HN points • 26 Feb 26
  1. Large language models are built by training huge neural networks on trillions of words to predict the next word, producing very powerful but imperfect base models that reflect their training data and cost a lot to train.
  2. Making models behave safely relies on fine‑tuning, human feedback (RLHF), constitutional rules, system prompts, filters, sandbox testing, and red‑teaming, but guardrails are always being probed and must be balanced against usefulness.
  3. Hallucinations—confident but false answers—and the question of whether models really 'think' are core issues, so techniques like retrieval‑augmented generation, citations, chain‑of‑thought, specialist models, and human review are used to reduce errors and limit harm.
One Useful Thing • 2226 implied HN points • 09 Dec 24
  1. AI is great for generating lots of ideas quickly. Instead of getting stuck after a few, you can use AI to come up with many different options.
  2. It's helpful to use AI when you have expertise and can easily spot mistakes. You can rely on it to assist with complex tasks without losing track of quality.
  3. However, be cautious using AI for learning or where accuracy is critical. It may shortcut your learning and sometimes make errors that are hard to notice.
Brain Pizza • 529 implied HN points • 16 Aug 25
  1. The idea that intelligence can be created just by collecting more data is a big misunderstanding. Intelligence is more about how we interact with and adapt to the world, rather than just crunching numbers.
  2. Current approaches to AGI focus too much on centralization, which ignores how intelligence naturally develops in a distributed way through social and biological processes.
  3. True understanding isn't just about having tons of information; it's about context and how we learn from our experiences. Intelligence evolves through interaction and adaptation, not through simply stacking data.
Democratizing Automation • 760 implied HN points • 28 Jun 25
  1. Deep learning is not as complicated as it seems; the basic ideas are pretty straightforward and can be learned quickly with the right guidance. You don't need years of study to understand how it works.
  2. Getting the right random initialization for neural networks is crucial. If the initialization is too small, the signal can decay and become unnoticeable, making it hard for the model to learn effectively.
  3. Machine learning focuses on achieving good enough results rather than perfect solutions. It’s more about finding practical and useful models with the resources available.
RSS DS+AI Section • 29 implied HN points • 01 Feb 26
  1. AI misuse and ethical risks are increasing — deepfakes, automated exploit generation, bias, and job impacts mean security, fairness, and regulation need urgent attention.
  2. Research is advancing rapidly across many fronts, including model consistency, memory/lookup mechanisms, test-time training, decentralized and open-source models, and early work on AI systems that can improve themselves.
  3. Practical resources and community activity are abundant, with tutorials, benchmarks, tools, academic outlets, and job opportunities helping practitioners deploy AI responsibly and learn new skills.
Gradient Flow • 878 implied HN points • 28 Dec 23
  1. AI and machine learning advancements in 2023 sparked vibrant discussions among developers, focusing on topics like large language models, infrastructure, and business applications.
  2. Technology media shifted its focus to highlight rapid AI advancements, covering diverse AI applications across industries while also addressing concerns about deepfakes and biases in AI systems.
  3. The book 'Mixed Signals' by Uri Gneezy was named the 2023 Book of the Year, offering insights on how incentives shape behavior in AI, technology, and business, with a focus on aligning incentives with ethical values.
Don't Worry About the Vase • 1881 implied HN points • 09 Jan 25
  1. AI can offer useful tasks, but many people still don't see its value or know how to use it effectively. It's important to change that mindset.
  2. Companies are realizing that fixed subscription prices for AI services might not be sustainable because usage varies greatly among users.
  3. Many folks are worried about AI despite not fully understanding it. It's crucial to communicate AI's potential benefits and reduce fears around job loss and other concerns.
Five Links (and three graphs) by Auren Hoffman • 56 implied HN points • 15 Jan 26
  1. A public prediction game pitted humans against three AIs and laid out ten bets for 2026 across health, geopolitics, economy, and AI impact.
  2. The AIs showed very different strategies — ChatGPT was strongly contrarian, Claude hedged cautiously, and Gemini bet optimistically — highlighting divergent machine reasoning.
  3. Both humans and AIs missed a major development in Venezuela, reminding us that experts and models alike can have big blind spots even after modest collective gains in prior years.
Democratizing Automation • 1717 implied HN points • 21 Jan 25
  1. DeepSeek R1 is a new reasoning language model that can be used openly by researchers and companies. This opens up opportunities for faster improvements in AI reasoning.
  2. The training process for DeepSeek R1 included four main stages, emphasizing reinforcement learning to enhance reasoning skills. This approach could lead to better performance in solving complex problems.
  3. Price competition in reasoning models is heating up, with DeepSeek R1 offering lower rates compared to existing options like OpenAI's model. This could make advanced AI more accessible and encourage further innovations.
Data Science Weekly Newsletter • 799 implied HN points • 05 Jan 24
  1. Data Science Weekly shares curated news and articles each week related to data science, AI, and machine learning. This helps readers stay updated on important trends and topics.
  2. Deepnote emphasizes using its own platform for building data infrastructure, showcasing how versatile tools can simplify data tasks. It highlights the importance of a universal computational medium.
  3. A reliable A/B testing system is essential for businesses to make informed decisions and optimize performance. Companies that use effective experimentation platforms can significantly improve their outcomes and reduce manual work.
HyperArc • 59 implied HN points • 05 Aug 24
  1. AI can help us learn about the Olympics and analyze different aspects, like who won medals and their physical attributes. It starts with basic questions and gets more complicated over time.
  2. While AI is good at remembering information and summarizing it, it struggles with reasoning about things it hasn't seen before. This means it can't always come up with new insights without the right data.
  3. For businesses, using AI with their private data can lead to smarter insights and faster decisions. It's important to combine human knowledge with AI to make the best use of available information.
Data Science Weekly Newsletter • 119 implied HN points • 04 Jul 24
  1. Staying updated in data science, AI, and machine learning is essential for improving skills and knowledge. Weekly newsletters provide curated articles and resources that help you keep up with the latest trends.
  2. Effective structuring of data science teams can greatly enhance productivity. Learning from past experiences on team reorganizations can help in clarifying roles and increasing effectiveness.
  3. Building interactive dashboards in Python can make data more accessible. Using tools like PostgreSQL and specific libraries can simplify the process and enhance data visualization.