The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 19 implied HN points 22 Nov 23
  1. Chain-Of-Knowledge (CoK) prompting is a useful technique for complex reasoning tasks. It helps make AI responses more accurate by using structured facts.
  2. Creating effective prompts using CoK requires careful construction of evidence and may involve human input. This is important for ensuring the quality and reliability of the information AI generates.
  3. The CoK approach aims to reduce errors or 'hallucinations' in AI responses. It offers a more transparent way to build prompts and enhances the overall reasoning ability of AI systems.
TheSequence 28 implied HN points 09 Feb 25
  1. AlphaGeometry2 has become a top performer in solving geometry problems, even surpassing human math Olympiad gold medalists. It can handle tough geometry concepts and has a better understanding of different math problems compared to its predecessor.
  2. The latest improvements in AlphaGeometry2 include an enhanced symbolic engine and a wider range of mathematical language features. This allows it to solve more complex geometry problems efficiently.
  3. AI is getting closer to matching or even exceeding human capabilities in competitive mathematics. This success in geometry could lead to similar advancements in other scientific fields like physics and chemistry.
The Long Game by Mehdi Yacoubi 3 implied HN points 19 Nov 25
  1. Longevity works best when you focus on basics—build muscle, move often, eat and sleep reasonably well—and avoid turning health into constant self-surveillance that makes you feel fragile.
  2. The AI app market is unstable because foundational model providers can rapidly absorb app features, so most startups either need to generate quick cash, aim to be acquired, or specialize in niches with unique atom-level data, hardware, or heavy enterprise integration.
  3. Real competitive advantage comes from controlling the full loop: huge, cleaned datasets, continent-scale multimodal models, and cheap execution that ties AI to real-world testing, and founders should build from conviction rather than chasing what’s currently fundable.
Sector 6 | The Newsletter of AIM 39 implied HN points 19 Mar 23
  1. Alpaca 7B is a new AI model introduced by Stanford that performs well, similar to OpenAI's models, but is smaller and cheaper to use.
  2. The AI landscape is buzzing with exciting developments and new models, making it an interesting time for AI enthusiasts.
  3. The week highlights a range of impressive AI technologies, signaling that there's much more innovation to come in this field.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Recommender systems 16 implied HN points 25 May 25
  1. Self-attention helps summarize a list of information, making it easier to find what's most relevant, like recent videos you watched.
  2. Graph attention looks at how items in a network relate to each other, like understanding social connections in a network.
  3. Target-aware attention checks how relevant certain items are based on your past choices or queries, helping improve recommendations.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 19 implied HN points 06 Nov 23
  1. When evaluating large language models (LLMs), it's important to define what you're trying to achieve. Know the problems you're solving so you can measure success and failure.
  2. Choosing the right data is crucial for evaluating LLMs. You'll need to think about what data to use and how it will be delivered in your application.
  3. The process of evaluation can be automated or involve human input. Deciding how to implement this process is key to building effective LLM applications.
The Works in Progress Newsletter 11 implied HN points 16 Jul 25
  1. Scientists estimate that a major earthquake can occur in the American West Coast, causing massive destruction and loss of life. Planning for these events is crucial, given the high number of residents in these areas today.
  2. Funding for earthquake prediction is very limited, focusing mostly on understanding where earthquakes might happen rather than when. There is a big need for more resources to develop better warning systems.
  3. Using advanced technology and data sharing can significantly improve earthquake prediction. A centralized lab focusing on research and collaboration could potentially provide better warning times and save lives.
Year 2049 11 implied HN points 17 Jul 25
  1. Reasoning models take time to think through problems step-by-step, unlike standard LLMs that give quick answers. This helps them break down complex questions and find better solutions.
  2. While reasoning models can work better for complex problems, they might fail on simpler ones and can overthink too much. Sometimes, basic LLMs are faster and more accurate.
  3. Choosing the right AI model for your task is important. Not every problem needs a reasoning model, so understanding their strengths and limitations can help set realistic expectations.
From the New World 26 implied HN points 06 Feb 25
  1. AI hardware has evolved significantly, from early specialized chips to powerful GPUs and TPUs. These advancements make training AI models much faster and more efficient.
  2. The design of algorithms, especially with transformers, has greatly improved AI's ability to understand and generate language. These models can now learn complex patterns that were hard to capture before.
  3. Building and maintaining large AI systems requires careful planning and practices. Companies need efficient workflows and monitoring systems to manage data, hardware, and software effectively.
inexactscience 39 implied HN points 14 Mar 23
  1. One big mistake in data science interviews is jumping to solutions too quickly. It's important to first understand the problem before trying to solve it.
  2. Asking questions during the interview can show your insight and help you gather essential information. It helps to clarify the business context and what needs to be addressed.
  3. Finding a balance is key. You want to ask enough questions to understand the issue without getting stuck in overthinking. A good candidate knows when to seek clarification and when to respond directly.
RSS DS+AI Section 11 implied HN points 01 Jul 25
  1. Data science and AI are constantly evolving, with new research and developments happening every month. It's important to stay updated on these changes.
  2. Ethical considerations like bias and privacy are ongoing challenges in the AI field. Engaging in discussions about these topics is crucial for responsible technology use.
  3. There are many practical applications and resources available for those wanting to enhance their skills in data science and AI. Exploring tutorials and job opportunities can help grow your knowledge and career.
The Novice 19 implied HN points 26 Oct 23
  1. AI is based on statistics and massive data processing, not magic.
  2. AI mimics human-like thought processes through algorithms and machine learning techniques.
  3. Understanding AI involves complex details and processes beyond human perception.
Data Thoughts 59 implied HN points 25 Nov 22
  1. The dbt meta tag helps document important info about data models. It's a simple way to keep track of data governance like ownership and sensitivity.
  2. Many companies have used the dbt meta tag to enhance their products. Some of these companies have received significant venture capital funding because of these improvements.
  3. Documenting tools and their funding related to the dbt meta tag can inspire others. It shows how small features can lead to big opportunities.
do clouds feel vertigo? 39 implied HN points 25 Mar 23
  1. Microsoft claims that GPT-4 shows potential for Artificial General Intelligence, but some critics doubt its transparency and reliability, feeling it's more of a marketing claim than factual science.
  2. Generative AI models can produce creative outputs but shouldn't be judged like traditional knowledge tools. They often generate believable yet false information, showcasing a need for a different evaluation standard.
  3. As AI technology evolves, the cost to create content is decreasing, which raises questions about who will really profit from it and how existing knowledge can be effectively leveraged in this new landscape.
The Counterfactual 59 implied HN points 04 Oct 22
  1. Recommendation systems can help us find new favorites but also risk making our choices repetitive. If we're only shown what we already like, we might miss out on discovering exciting new things.
  2. There's a balance between exploring new options and sticking to what we know. Too much of either can lead to boredom or discomfort, so it’s important to mix both approaches in our choices.
  3. Serendipity, or those happy accidents that lead to great moments, can be lost with strict recommendation systems. Sometimes the best experiences come from unexpected encounters, not just from things we already enjoy.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 19 implied HN points 18 Oct 23
  1. Large Language Models (LLMs) rely on both input and output data that are unstructured and conversational. This means they process language in a natural, free-flowing manner.
  2. Fine-tuning LLMs has become less popular because it requires a lot of specific training and can get outdated. Using contextual prompts at the right time is a better way to improve their accuracy.
  3. New tools are emerging that test different LLMs against prompts instead of just tweaking prompts for one LLM. This helps in finding the best model suited for different tasks.
TheSequence 35 implied HN points 05 Nov 24
  1. Knowledge distillation helps make large AI models smaller and cheaper. This is important for using AI on devices like smartphones.
  2. A key goal of this process is to keep the accuracy of the original model while reducing its size.
  3. The series will include reviews of research papers and discussions on frameworks like Google's Data Commons that support factual knowledge in AI.
The Jolly Contrarian 19 implied HN points 14 Aug 23
  1. Premium JC update includes progress on premiumizing ISDA and Equity Derivatives Definitions
  2. Consolidated anatomy of emissions trading documentation is in the works under ISDA, EFET, and IETA
  3. JC Essays explore themes like form versus substance, system redundancy, and pace layering
The Long Game by Mehdi Yacoubi 2 implied HN points 04 Dec 25
  1. Embryo selection is extremely high-stakes, so companies must have honest marketing and solid science. If you see fake reviews, copied research, or basic methodological errors, be very skeptical and don't trust them with decisions about future children.
  2. Set deliberately low expectations so small improvements feel like wins and bad news feels normal. Controlling your expectations reduces unnecessary suffering and helps you appreciate progress.
  3. Stop waiting for life to happen and take yourself seriously by choosing a direction and acting on it. Real progress comes from responsibility, risk, and doing more than what feels safe.
The Palindrome 1 implied HN point 12 Jan 26
  1. The camel principle is the idea that you can add zero in clever ways to transform problems, and that tiny trick can unlock big simplifications.
  2. Adding zero is essential because it helps rewrite expressions, simplify derivations, and connect different methods across mathematics and machine learning.
  3. A practical workshop can teach these foundations by building linear regression from scratch, covering vectors, vectorized code, optimization, and gradient descent with notebooks and recordings for practice.
Sector 6 | The Newsletter of AIM 19 implied HN points 25 Jul 23
  1. Andrej Karpathy worked on a fun project to create a smaller version of the Llama 2 model called Baby Llama. It's designed to run on a single computer.
  2. The Baby Llama can load and use the models released by Meta, making it more accessible for users.
  3. Karpathy shared that the performance is promising, with potential for faster processing speeds on a cloud setup.
Luminotes 28 implied HN points 15 Dec 24
  1. The CIA has a unique Python style guide, focusing on clarity and readability, with special rules for exceptions, globals, and list comprehensions.
  2. They use specific tools like PyCharm for development and have a custom setup for installing Python and managing packages within secure environments.
  3. There are no strict rules governing coding practices; instead, individuals make choices based on their preferences and the limitations of their working conditions.
Perspective Agents 24 implied HN points 15 Jan 25
  1. AI is changing how we work and learn. Jobs will focus more on things like emotional intelligence and problem-solving instead of routine tasks.
  2. There is a big gap between those who understand and use AI effectively and those who don't. This gap can lead to businesses being left behind if they don't adapt.
  3. Whether it's through simulations or understanding people's feelings, human touch will always matter. Genuine moments of connection can outshine machines, even if they seem perfect.
Technically 34 implied HN points 21 Oct 24
  1. A vector database is a special storage for data used in AI. It helps store numbers that represent different types of information like text or images.
  2. To make AI models smarter, they need to use unique data from companies. This helps tailor responses and improve accuracy.
  3. There are ways to enhance AI models with unique data, like fine-tuning them or using a method called Retrieval Augmented Generation (RAG) to include important information in prompts.
Gradient Flow 99 implied HN points 04 Nov 21
  1. Data scientists should transition into social scientists in addition to being computer scientists.
  2. The report presents insights from a global online survey of 372 respondents on data engineering trends and challenges.
  3. Information on improvements in large language models, modernizing data integration, and the importance of data quality is shared in the podcast.
TheSequence 28 implied HN points 03 Dec 24
  1. Cross-modal distillation allows one model to teach another model that works with a different type of data. This means you can share knowledge even if the models are processing images, text, or something else entirely.
  2. This method can be really helpful when there's not much paired data available. It helps improve the learning process in situations where gathering data might be difficult.
  3. Hugging Face’s Gradio lets developers create AI applications for the web easily. It's a neat tool that helps bring AI to everyday use in a user-friendly way.
Year 2049 22 implied HN points 28 Jan 25
  1. The actual cost to train DeepSeek R1 is unknown, but it’s likely higher than the reported $5.6 million for its base model, DeepSeek V3.
  2. DeepSeek used a different training method called Reinforcement Learning, which lets the model improve itself based on rewards, unlike OpenAI's supervised learning approach.
  3. DeepSeek R1 is open-source and much cheaper to use for developers and businesses, challenging the idea that expensive hardware is necessary for AI model training.
Artificial Ignorance 29 implied HN points 15 Nov 24
  1. Big AI companies are realizing that just making their models bigger doesn't always improve their performance. They're facing challenges because the quality of training data is more important than simply using more computing power.
  2. AI companies need to create new ways to measure performance since the old benchmarks are outdated. This lack of standard testing makes it hard for people to compare how different AI models stack up against each other.
  3. AI-generated art is becoming more popular and accepted in the market. A recent artwork sold for a lot of money, showing that people are starting to appreciate creations made by AI, even though it raises questions about what creativity really means.
Vesuvius Challenge 21 implied HN points 24 Jan 25
  1. Two teams were awarded for their amazing work on automating scroll segmentation. They worked really hard, using only a few hours of human help to get impressive results.
  2. The new methods focus on breaking down the task into smaller parts, like surface prediction and fitting, making it easier and faster to recover lost texts from ancient scrolls.
  3. Even though there are still challenges, the community is excited about the progress and future plans, like getting better at detecting ink on more scrolls.
RSS DS+AI Section 29 implied HN points 01 Nov 24
  1. Data science and AI are constantly evolving, with new research and developments being released regularly. It's important to stay updated on these changes to understand their implications.
  2. Ethics, bias, and regulation in AI continue to be hot topics. Discussions around how to handle these challenges are crucial for the responsible use of AI technologies.
  3. There are many practical applications and resources available for those interested in implementing AI. Tips and how-to guides can help individuals and organizations make better use of these technologies.
Sector 6 | The Newsletter of AIM 19 implied HN points 30 Jun 23
  1. GPT-4 is seen as disappointing compared to expectations. People hoped for more detailed information, but it was not provided.
  2. OpenAI's decision to keep model specifics secret may have led to letdowns. Transparency could have changed many opinions about its performance.
  3. The head of OpenAI hinted that users should prepare for disappointment, which matched how many felt after experiencing GPT-4.
Vesuvius Challenge 9 implied HN points 13 Jun 25
  1. The Vesuvius Challenge team is improving their tools for handling scroll data. They're making it easier for people to process large datasets without needing advanced tech skills.
  2. Philip Allgaier made significant updates to the VC3D tool, including fixing memory issues and making it easier to install and use. This will help users have a smoother experience.
  3. New features like freehand drawing and better options for data analysis have been added, which will boost productivity for those working with the VC3D tool.
Decoding Coding 19 implied HN points 25 May 23
  1. StructGPT helps large language models (LLMs) work better with structured data like graphs and databases. It converts this complex data into a simpler format that LLMs can understand.
  2. There are three key tasks that StructGPT can do: answer questions based on knowledge graphs, process data tables, and perform text-to-SQL queries. Each task has its own specific steps.
  3. The method focuses on linearizing raw data so that LLMs can process it more effectively. This allows LLMs to handle a wider variety of tasks more efficiently.
The Kahneman Bot 19 implied HN points 13 Feb 23
  1. To get into tech as a behavioral scientist, consider starting in a junior PM role, transferring internally, working at a startup, or starting your own company.
  2. Before transitioning into tech, make sure you enjoy building software and understand how tech teams work.
  3. Experienced behavioral scientists can enter tech by joining a big tech company as a researcher, rebranding as a data scientist, or joining a tech company that values behavioral science as part of its IP.
Decoding Coding 1 HN point 19 Jul 24
  1. Understanding the 'keepdims' parameter in tensor operations is important for getting correct results in PyTorch. If you set 'keepdims' to True, the dimensions are preserved, which helps with broadcasting correctly.
  2. When summing tensors, if 'keepdims' is False, it can lead to incorrect calculations because the tensor's shape changes. This can result in dividing values incorrectly, leading to unexpected outputs.
  3. It's crucial to be careful with tensor shapes and broadcasting rules in machine learning models. Even a small oversight can cause models to produce wrong predictions, so always double-check these details.
Decoding Coding 19 implied HN points 18 May 23
  1. Airbnb uses a special tool called Zipline for feature engineering in their Customer Lifetime Value model, which helps them pick and create over 150 features needed for predictions.
  2. Chicisimo built a recommendation system based on user data, which includes both objective and subjective features, to give personalized fashion advice using their Social Fashion Graph.
  3. Case studies provide valuable lessons in applying frameworks to real-world projects, showing that you need both a good framework and experience from past projects to succeed.