The hottest Machine Learning Substack posts right now

And their main takeaways

Data Science Weekly - Issue 470

Data Science Weekly Newsletter • 19 implied HN points • 24 Nov 22

🕹 Technology Machine Learning

Using recommender systems can lead to problems like clickbait and addiction if they're only focused on engagement. We need to think differently to create better systems that really serve people's needs.
GitLab has a detailed Data Team Handbook that explains how their data team works, what data is available, and how it helps different departments make decisions. This can guide other teams looking to improve their data processes.
Deep learning techniques are being researched to playtest video games like Candy Crush. This shows how AI can create more human-like testing methods and improve the gaming experience.

Reducing selection bias / popularity bias in ranking

Recommender systems • 26 implied HN points • 20 Jan 24

🕹 Technology Machine Learning

Reducing selection bias and popularity bias in ranking is important for recommender systems.
An advocated approach is to factorize user interaction signals to account for biases originating from power users and power items.
The proposals for causal/debiased ranking involve factorization, mutual information, and mixture of logits to improve the ranking model.

Who needs prompt engineers anyway?

Let's talk games & AI. • 12 implied HN points • 22 Oct 24

🕹 Technology Machine Learning

AI can now write its own prompts, saving time and money compared to humans doing it. This is especially helpful for tasks with clear inputs and outputs.
Building a system that helps AI generate and test prompts can greatly improve efficiency and reduce complexity in automation tasks. It also lowers costs for the same quality output.
Humans still play an important role by providing initial data and guidance but the bulk of the work is shifting to AI. This means we need to create good systems that let AI handle most tasks.

Data Science Weekly - Issue 469

Data Science Weekly Newsletter • 19 implied HN points • 17 Nov 22

🕹 Technology Machine Learning

Learning machine learning can be accomplished without an engineering background. It often requires hard work, perseverance, and adopting good software engineering practices.
Robotics and AI are being increasingly used in fulfillment processes at companies like Amazon. These technologies face challenges but also provide innovative solutions for package handling.
Large language models are evolving to act like agents that make decisions. This shift towards action-driven models may make them resemble artificial general intelligence (AGI) more closely.

Do Large Language Models have a "Reasoning Gap"?

The Irregular Voice • 2 HN points • 01 Apr 24

🕹 Technology Machine Learning

Large Language Models (LLMs) may not always exhibit true reasoning abilities, with a potential reliance on memorization instead of learning general techniques.
Synthetic data generation systems like MATH() can be used to explore the reasoning capabilities of LLMs, but may introduce biases if not carefully analyzed and corrected for errors.
Fine-tuning LLMs on specific problem areas can reveal insights into their reasoning abilities, but challenges with longer solutions and complex problem sets may impact performance.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

New tools to use with new scroll

Vesuvius Challenge • 10 implied HN points • 27 Nov 24

🕹 Technology Machine Learning

The Vesuvius Challenge has introduced new tools to help with studying ancient scrolls. These tools are meant to improve our understanding of scrolls found in Herculaneum.
There is a total of $18,500 available as prizes for community contributions. The rewards are aimed at motivating open-source work that supports the reading and analysis of the new scroll dataset.
Several contributors have developed techniques and tools for better image segmentation and data analysis of scrolls. These advancements help make the process of interpreting ancient texts easier and more accurate.

Mathematics of Machine Learning official release announcement!

The Palindrome • 8 implied HN points • 29 Jan 25

🕹 Technology Machine Learning

The book 'Mathematics of Machine Learning' is set to be published soon and will be available in a physical version. You can pre-order it at a discounted price now.
It focuses on important math concepts needed for machine learning, including linear algebra, calculus, and probability theory. Understanding these areas is crucial for building effective models in machine learning.
The author shares a personal journey of creating the book, which was inspired by his experiences in the field. The book aims to bridge the gap between theory and practical applications.

Data Science Weekly - Issue 468

Data Science Weekly Newsletter • 19 implied HN points • 10 Nov 22

🕹 Technology Machine Learning

If you're thinking about leaving Twitter, it's a good idea to save your data first. You can use it to find trends and insights that might be really useful later.
Learning command-line data analytics can make your data processing much easier. There's a new tool called SPyQL that makes it simpler to work with and understand data on the command line.
Federated learning allows us to train models using data from many users without needing to see the actual data. This means we can protect privacy while still making progress in AI.

Reinforcement, evolution, and automata

Gradient Ascendant • 7 implied HN points • 26 Feb 25

🕹 Technology Machine Learning

Reinforcement learning is becoming important again, helping improve AI models by using trial and error. This allows models to make better decisions based on past experiences.
AI improvements are not just for big systems but can also work on smaller models, even those that run on phones. This shows that smarter AI can be more accessible.
Combining reinforcement learning with evolutionary strategies could create more advanced AI systems in the future, leading to exciting developments and solutions.

Gradient Flow #42: Data Quality; Oscilloscope for Deep Learning; Feature Stores

Gradient Flow • 39 implied HN points • 26 Aug 21

🕹 Technology Machine Learning

Data quality is crucial in machine learning and new tools like feature stores are emerging to improve data management.
Experts are working on auditing machine learning models to address issues like discrimination and bias.
Large deep learning models such as Jurassic-1 Jumbo with 178B parameters are being made available for developers.

Data Science Weekly - Issue 467

Data Science Weekly Newsletter • 19 implied HN points • 03 Nov 22

🕹 Technology Machine Learning

User experience (UX) is really important for startups using large language models. Many struggle because they focus on the wrong things instead of improving UX and product design.
Data science notebooks have evolved a lot since they were first introduced. They are now essential tools in data science, and there’s an exciting future ahead for their development.
OpenAI is financially supporting AI startups with a significant investment. They're offering early access to their systems to help these startups grow.

Data Science Weekly - Issue 466

Data Science Weekly Newsletter • 19 implied HN points • 27 Oct 22

🕹 Technology Machine Learning

Science education should focus on teaching scientific virtues first, rather than just tools and techniques. This approach helps students understand the core values of scientific inquiry.
A data dictionary is essential for ensuring quality data collection and interpretation. It's best created before data collection to guide your research design.
The Farama Foundation is aimed at improving open-source reinforcement learning by maintaining and standardizing existing libraries. This will help in developing more effective RL tools for the community.

October recap

The Palindrome • 1 implied HN point • 09 Nov 25

🕹 Technology Machine Learning

In October, several new articles were published on machine learning topics, including how to measure information and understanding computational graphs. These resources are helpful for anyone looking to learn about these subjects.
The Palindrome hosted live events, including 'Office Hours' and interviews with experts. These sessions offered a chance for members to engage and learn more directly from knowledgeable guests.
The community is growing with over 540 machine learning practitioners joining the membership, making it a great place for networking and learning together.

Data Science Weekly - Issue 465

Data Science Weekly Newsletter • 19 implied HN points • 20 Oct 22

🕹 Technology Machine Learning

AI writing assistants are helping indie authors write faster and come up with story ideas. Tools like Lex are changing how creatives approach their writing.
Recent research shows that parts of the brain, like the hippocampus, work similarly to AI models known as transformers. This discovery helps us understand both artificial intelligence and human memory.
The State of AI Report 2022 reviews important trends in AI, including technology breakthroughs, commercial applications, and safety concerns. It provides valuable insights for both researchers and industry leaders.

☀️ Newsletter #28: hack the system, catch the light, conquer space...

Women On Rails Newsletter - International Version • 19 implied HN points • 29 Mar 22

🕹 Technology Machine Learning

The newsletter covers topics like Machine Learning, design skills, and historical insights on being a woman developer in the 60s.
Interesting updates on Ruby and Rails, with resources for upgrading to Ruby 3.0, finding Ruby career paths, and insights on Static Site Generators.
Tips include a tool to generate empty commits on GitHub for managing multiple accounts, a tutorial on building a ML Web App, and an article on exploring vulnerabilities in Zoom by a Security Engineer.

Decoding the ACL Paper: Gzip and KNN Rival BERT in Text Classification

Confessions of a Code Addict • 34 HN points • 20 Jul 23

🕹 Technology Machine Learning

A new paper introduces a simple gzip + KNN approach that rivals BERT for text classification.
The gzip + KNN approach is lightweight, non-parametric, and performs well on out-of-distribution datasets.
One potential issue with the paper is a bug in the implementation of KNN, affecting reported accuracy.

Data Science Weekly - Issue 464

Data Science Weekly Newsletter • 19 implied HN points • 13 Oct 22

🕹 Technology Machine Learning

Building a community around R in the pharmaceutical industry can help users connect and share knowledge more effectively. It's important to identify who the users are and create a space for collaboration.
Creating research ideas can start with understanding gaps in existing literature. By reading a single paper, you can learn frameworks to generate new ideas and improve your research quality.
Data cleaning for machine learning models is crucial, starting from the ETL pipeline. It’s important to commit to high-quality data from the beginning to avoid common pitfalls that impact model accuracy.

Data Science Weekly - Issue 463

Data Science Weekly Newsletter • 19 implied HN points • 06 Oct 22

🕹 Technology Machine Learning

When you get a big CSV file, it's important to choose the right tools to explore and understand the data quickly.
Using AI, like GPT-3, can help turn messy text into organized data, saving a lot of manual work.
There's growing interest in using collective intelligence ideas to improve deep learning and AI research.

Data Science Weekly - Issue 462

Data Science Weekly Newsletter • 19 implied HN points • 29 Sep 22

🕹 Technology Machine Learning

Teaching students about scientific failure helps them build resilience. It prepares them for real-world challenges in research.
Understanding uncertainty in deep learning models is crucial for effective use. It helps in making better predictions and decisions.
Increasing data maturity in organizations leads to more strategic use of data. Assessing data maturity can guide teams in improving their data practices.

Gradient Flow #38: Large Language Models, Infinite Laptop, Overhyping AI

Gradient Flow • 39 implied HN points • 01 Jul 21

🕹 Technology Machine Learning

Training large language models involves a new role referred to as 'prompt engineer'.
TabNet, a deep neural network for tabular data, outperforms other models in classification and regression problems.
Tools like AugLy for data augmentation and Flat Data for data acquisition simplify tasks and enhance model robustness.

Data Science Weekly - Issue 461

Data Science Weekly Newsletter • 19 implied HN points • 22 Sep 22

🕹 Technology Machine Learning

Working in Natural Language Processing (NLP) involves keeping up with evolving models and figuring out how to effectively use data. It's still challenging for many to find practical applications for NLP.
Generative AI has the potential to make workers significantly more efficient and creative. This could result in substantial economic value across various industries.
Building trust in machine learning is crucial but challenging. It's important to address concerns about model reliability to maximize its business value.

Data Science Weekly - Issue 460

Data Science Weekly Newsletter • 19 implied HN points • 15 Sep 22

🕹 Technology Machine Learning

Soft skills are super important for data scientists. Being able to communicate well and work in a team can make a big difference in their effectiveness.
There are great resources available online for learning data science, including live streams on platforms like Twitch. It’s a fun way to learn and engage with others.
Use the right fonts and designs in data visualizations. They can greatly affect how your data is understood and appreciated.

Data Science Weekly - Issue 459

Data Science Weekly Newsletter • 19 implied HN points • 08 Sep 22

🕹 Technology Machine Learning

Organizations need to invest in creating better data to gain an advantage over competitors. Good data can drive value and improve decision-making.
The activation layer of the modern data stack helps you use data in a more impactful way. This allows for personalized experiences rather than just viewing dashboards.
Using standard formats like ONNX for model exporting makes your machine learning models more portable across different programming environments, reducing dependencies on specific languages.

Quant Letter: January 2024, Week-4

The Parlour • 21 implied HN points • 23 Jan 24

🕹 Technology Machine Learning

The blog post discusses various research papers on topics like financial risk modeling, interest rate models, and credit risk stress testing.
New methods for predictive modeling in finance, including data-driven option pricing and generative modeling for financial time series, are introduced in the presented papers.
The research covers diverse areas such as economics, crypto, and blockchain, offering insights on market responses, equity premium puzzles, and AI investment rankings in Latin America.

Data Science Weekly - Issue 458

Data Science Weekly Newsletter • 19 implied HN points • 01 Sep 22

🕹 Technology Machine Learning

Machine learning best practices are shared in a guide from Google, helping those with some knowledge to improve their skills.
There's skepticism about deep learning promises, as experts continue to predict big changes that haven't happened yet.
AI is being used creatively, like generating art from Bible stories, which showcases the potential of technology in different fields.

Latent Reasoning, 3D Colorization, and the Limits of RL

HackerPulse Dispatch • 8 implied HN points • 13 Dec 24

🕹 Technology Machine Learning

COCONUT is a new method that lets language models think in flexible ways, making it better at solving complex problems. It does this by using continuous latent spaces instead of just words.
ChromaDistill offers a smart way to add color to 3D images efficiently. It lets you view these scenes consistently from different angles without slowing things down.
Recent research shows that top AI models can be deceptive and plan strategically, which raises important safety concerns. There’s also a new approach to testing AI limits in a friendly, curiosity-driven way.

Data Science Weekly - Issue 457

Data Science Weekly Newsletter • 19 implied HN points • 25 Aug 22

🕹 Technology Machine Learning

AI systems struggle with language limitations and won't fully replicate human thinking. This shows that our understanding of thought and language needs to evolve.
Observable launched Free Teams to encourage more open collaboration in data science. It allows users to easily work together on projects and share insights for free.
There is a problem in the data industry where roles are too narrowly defined, leading to a lack of collaboration. This makes it hard for teams to communicate and understand each other's work.

Time to BLOOM 🌸

Sector 6 | The Newsletter of AIM • 19 implied HN points • 04 Jul 22

🕹 Technology Machine Learning

BLOOM is a new open-source language model with 176 billion parameters. It's considered impressive because it was developed outside of the big tech companies.
This model is similar in structure to GPT-3, but its open-access nature means anyone can use it.
BLOOM represents a shift towards more collaborative and open approaches in AI research and development, encouraging more shared knowledge.

Rapid proteome-wide prediction of lipid-interacting proteins through ligand-guided structural genomics

Axial • 7 implied HN points • 05 Jan 25

🔬 Science Machine Learning

Researchers developed a new tool called SLiPP that helps quickly find proteins that interact with lipids. This is important because lipids play key roles in cell functions and diseases.
SLiPP uses machine learning to distinguish between protein pockets likely to bind lipids and those that won't. This makes it easier to identify potential targets for drug discovery.
The tool has been successfully tested on different organisms, showing it can accurately predict lipid-binding proteins. This helps scientists explore new areas in lipid biology and disease research.

Introducing The Data Exchange

Gradient Flow • 79 implied HN points • 14 Nov 19

🎙 Podcasts Machine Learning

The Data Exchange is a new independent podcast focusing on data, machine learning, and AI
The podcast aims to build a community to help people make better decisions
To support The Data Exchange, listeners are encouraged to subscribe and share with friends

Data Science Weekly - Issue 456

Data Science Weekly Newsletter • 19 implied HN points • 18 Aug 22

🕹 Technology Machine Learning

Machine learning models need ongoing maintenance after they're deployed. The world changes, and so do the needs for the models.
Using machine learning can make software testing more efficient, especially in complex applications like browsers.
There are many resources available for people who want to get into machine learning and deep learning, including courses, videos, and discussions on best practices.

The emptiness at the heart of emotion recognition

Apperceptive (moved to buttondown) • 22 implied HN points • 08 Dec 23

🕹 Technology Machine Learning

Emotion recognition technology in machine learning is based on a logical fallacy.
The foundational work of Paul Ekman in emotion recognition lacks scientific validity.
ML models for emotion recognition fail to capture the complexity of human emotional expressions and relationships.

Data Science Weekly - Issue 455

Data Science Weekly Newsletter • 19 implied HN points • 11 Aug 22

🕹 Technology Machine Learning

Data professionals spend a lot of time checking data quality, which costs companies a lot of money every year. Poor data quality can affect a company's revenue significantly.
Understanding how AI models behave is important for data scientists. They need to develop good mental models to train and work effectively with these systems.
Vector search is becoming popular in retail for improving various aspects like revenue and customer satisfaction. It helps teams make better use of their data.

☎️ Interview: Rick Hao, Partner at SpeedInvest on the State of Privacy-Enhancing Technologies #005

Let Us Face the Future • 19 implied HN points • 05 May 23

🕹 Technology Machine Learning

Having more data will continue to drive the adoption of Privacy-Enhancing Technologies (PETs).
Healthcare requires specialized data infrastructure different from other markets.
Machine learning is expected to be a key factor in the adoption of data sharing tools.

Data Science Weekly - Issue 454

Data Science Weekly Newsletter • 19 implied HN points • 04 Aug 22

🕹 Technology Machine Learning

NASA is using machine learning to organize millions of astronaut photos of Earth. This technology helps scientists access and study these images more effectively.
Data-driven companies can have a competitive edge in the market. The right expertise and data strategy can influence investors' decisions.
There are many resources and discussions available online about using machine learning and data science effectively. Engaging with these can help keep skills and knowledge up to date.

Data Science Weekly - Issue 453

Data Science Weekly Newsletter • 19 implied HN points • 28 Jul 22

🕹 Technology Machine Learning

Creating a focused GitHub repository can help others in the field, like those working with satellite images and deep learning.
There are unique Python packages available that can enhance your data workflow, making tasks easier and more efficient.
Understanding the technology behind AI and how to use it effectively is crucial for building better models and systems.

What Full Body Composition Scales Teach About Machine Learning

The Palindrome • 3 implied HN points • 16 Jun 25

🕹 Technology Machine Learning

Not all body composition scales are accurate, but some of them are less wrong than others. It's important to understand how bias and variance affect their readings.
Bias refers to a consistent error in measurements, while variance relates to the randomness of measurement errors. Both play a role in how reliable a scale's readings can be.
When choosing a scale, it's better to prioritize low variance over low bias if you're only interested in tracking trends rather than precise values.

Data Science Weekly - Issue 452

Data Science Weekly Newsletter • 19 implied HN points • 21 Jul 22

🕹 Technology Machine Learning

The role of data scientist remains popular and well-paid, with growth expected in the field by 2029.
Large language models (LLMs) are rapidly evolving and are becoming integral to various applications in our daily lives.
Many industries are seeing the rise of domain experts who can now create and work with deep learning models without needing advanced degrees.

ChatGPT makes shit up occasionally, and that's perfect.

Perceptions • 35 implied HN points • 01 Mar 23

🕹 Technology Machine Learning

ChatGPT and similar frontends have seen fast adoption recently due to their ability to have conversations and answer questions based on vast amounts of written information on the internet.
Large Language Models, like ChatGPT, represent a departure from traditional technology by providing knowledge based on existing information, rather than following specific instructions.
The rise of heuristically thinking machines, such as ChatGPT, shows a shift towards real AI where technology can think and act like humans.

Quant Letter: November 2023, Week 4

The Parlour • 21 implied HN points • 29 Nov 23

💰 Finance Machine Learning

The paper introduces a methodology using Shapley values to understand the contribution of different factors in portfolio performance.
It presents the versatile SPPC method for evaluating predictor group contributions to portfolio success.
The SPPC method quantifies predictor impacts and offers insights into changing dynamics over time in financial machine learning.