The hottest Machine Learning Substack posts right now

And their main takeaways

Data Science Weekly - Issue 371

Data Science Weekly Newsletter • 19 implied HN points • 31 Dec 20

🕹 Technology Machine Learning

Real-time machine learning is becoming important for many companies. Some have invested heavily in the right infrastructure and are seeing good results.
There are many new tools for machine learning and MLOps. Keeping track of these tools can help in improving workflow and project success.
Understanding concepts like Markov models can help in planning routines, such as workouts, based on previous choices. This helps in making smart decisions about what to do next.

Data Science Weekly - Issue 370

Data Science Weekly Newsletter • 19 implied HN points • 24 Dec 20

🕹 Technology Machine Learning

NeRF technology made big waves in 2020, changing how we render 3D images with neural networks. It’s a cool new area in data science that’s just starting to grow.
DeepMind's MuZero AI is impressive because it learns the rules of games by itself, improving how we analyze videos. This could lead to cost cuts for platforms like YouTube.
If you're looking to start a career in data science, there are practical guides available. These can help you with everything from filling knowledge gaps to creating a strong portfolio.

Super Weights in LLMs - How Pruning Them Destroys a LLM's Ability to Generate Text ?

Machine Learning Diaries • 3 implied HN points • 18 Nov 24

🕹 Technology Machine Learning

Super weights are very important for how well large language models (LLMs) perform. Even though they're a tiny part of the model, they can greatly affect the results.
If a super weight is removed, it can ruin the model's ability to generate clear text and make predictions. Just taking out one of these weights can cause a huge drop in performance.
Removing regular outlier weights doesn't harm performance much, but losing just one super weight is much worse than taking out a lot of other weights combined.

Data Science Weekly - Issue 369

Data Science Weekly Newsletter • 19 implied HN points • 17 Dec 20

🕹 Technology Machine Learning

Companies are changing how they share information because of AI. They're making their reports easier for machines to read, which can influence market behavior.
Monitoring machine learning models is essential for maintaining accuracy. It's important to detect issues like outliers and changes in data patterns in real-time.
Deep learning research often helps engineers tackle real-world problems effectively. Insights from recent research can guide better practices in building and deploying models.

Data Science Weekly - Issue 368

Data Science Weekly Newsletter • 19 implied HN points • 10 Dec 20

🕹 Technology Machine Learning

Machine learning needs systematic approaches to create strong systems for real-world use. This means looking beyond just algorithms to see the bigger picture.
Deep neural networks are powerful, but understanding how they work can be tricky. Tools like network dissection can help us figure out what these networks are really doing.
Feature stores are becoming important for machine learning. They allow teams to share and manage data better for creating and deploying models quickly.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Introduction to Language Learning Models (LLMs): An Informative and Approachable Guide

ScaleDown • 11 implied HN points • 07 Jun 23

🕹 Technology Machine Learning

Before Transformers like the Transformer model, RNNs and CNNs were commonly used for sequence data but had their limitations.
Tokenization is a crucial step in processing data for models like LLMs, breaking down sentences into tokens for analysis.
The introduction of the Transformer model in 2017 revolutionized NLP with its attention mechanism, impacting how tokens are weighted in context.

Introducing Etalon: How we choose a LLM with optimal Runtime Performance ?

Machine Learning Diaries • 3 implied HN points • 11 Nov 24

🕹 Technology Machine Learning

Evaluating large language models (LLMs) is important for ensuring a good user experience. Existing metrics like Time to First Token (TTFT) and Time Between Tokens (TBT) don't fully capture how these models perform in real-time applications.
The proposed 'Etalon' framework offers a new way to measure LLMs using a 'fluidity-index' that helps track how well the model meets deadlines. This ensures smoother and more responsive interactions.
Current metrics can hide issues like delays and jitters during token generation. The new approach aims to provide a clearer picture of performance by considering these factors, leading to better user satisfaction.

Data Science Weekly - Issue 367

Data Science Weekly Newsletter • 19 implied HN points • 03 Dec 20

🕹 Technology Machine Learning

AlphaFold is a huge breakthrough in biology that helps solve the protein folding problem, which has puzzled scientists for 50 years. It shows how AI can speed up scientific discovery.
Spotify needs good tools to make sense of its massive data from millions of users. Designing user-friendly data tools is key for them to understand and improve their services.
Having high-quality data is essential for companies. New technologies can help businesses maintain data quality without spending huge amounts of money.

Data Science Weekly - Issue 366

Data Science Weekly Newsletter • 19 implied HN points • 26 Nov 20

🕹 Technology Machine Learning

Pinterest improved its machine learning signals by updating its data infrastructure. They moved from a Lambda architecture to a Kappa architecture for better real-time performance.
DoorDash built a feature store to handle the massive amounts of data needed for its machine learning models. This helps them manage costs and maintain fast performance when retrieving data.
When choosing between a data lake, warehouse, or lakehouse, it's important to consider the specific needs of your data platform. The right choice depends on the tools that best fit your project requirements.

How to implement a decision tree

The Palindrome • 3 implied HN points • 08 Nov 24

🕹 Technology Machine Learning

A decision tree splits data based on features and thresholds, which helps in making predictions by creating branches. Each split leads to two outcomes based on whether the condition is met or not.
Gini impurity is a key measure for evaluating how 'pure' the labels are in each leaf of the tree. A lower Gini impurity means better predictability for a leaf's classification.
You can create both classification and regression trees by changing how you score the splits and define the predictions in the leaves. This flexibility allows for various applications in data analysis.

Data Science Weekly - Issue 365

Data Science Weekly Newsletter • 19 implied HN points • 19 Nov 20

🕹 Technology Machine Learning

It's important to connect with AI researchers as people, not just through their work. Personal stories can give better insights into their lives and motivations.
Dynamic data testing is crucial for effective data analysis. Unlike software testing, data needs flexible tests that can adjust as it changes.
Creating open datasets for sound events helps improve research in machine learning. These datasets can provide valuable resources for training models.

Will LLMs Make NLP Scientists Jobless?

Pratik’s Pakodas 🍿 • 12 implied HN points • 21 Mar 23

🕹 Technology Machine Learning

Technological progress leads to job displacement but also creates new opportunities.
Understanding when and where to use LLMs is crucial for NLP engineers to deliver value.
NLP engineers may see a shift from the need for researchers to the demand for full-stack engineers due to advancements in LLM technology.

Data Science Weekly - Issue 364

Data Science Weekly Newsletter • 19 implied HN points • 12 Nov 20

🕹 Technology Machine Learning

Organizing data in spreadsheets can help prevent errors and make analysis easier. It's important to keep a consistent format and to avoid leaving any empty cells.
AI is being used to create music that sounds like famous artists, which could change the music industry. This technology raises questions about copyright and authenticity.
Monitoring tools are becoming essential for data scientists to track their models for performance and integrity. These tools help ensure that models are accurate and reliable over time.

Becoming One with the Machine

Malt Liquidity • 6 implied HN points • 13 Mar 24

🕹 Technology Machine Learning

Our brain is exceptional at pattern recognition, and merging with technology can enhance our abilities.
Visual processing is faster than auditory processing, like in chess where seeing the board is more efficient than listening to a game.
Technology, like AI, can help turbocharge our skills by providing new perspectives and automating processes, leading to more creative problem-solving.

Data Science Weekly - Issue 363

Data Science Weekly Newsletter • 19 implied HN points • 05 Nov 20

🕹 Technology Machine Learning

Synthetic biology has gained a lot of attention over the past decade, and it's been evolving to deliver real technologies and breakthroughs.
Data poisoning is a serious concern in machine learning, as bad data can manipulate model predictions, especially with NLP models.
Managing data for machine learning projects is challenging, but using version control tools can help keep things organized and prevent unexpected issues.

Data Science Weekly - Issue 362

Data Science Weekly Newsletter • 19 implied HN points • 29 Oct 20

🕹 Technology Machine Learning

Form extraction using AI can help important fields like journalism and medicine by accurately pulling data from documents. This can significantly improve research and decision-making.
Data engineering is crucial and involves gathering, cleaning, and shaping data before it's analyzed. It's just as important as data science, which builds on that data to create insights and models.
Dealing with data imbalance can be tricky, but using semi-supervised and self-supervised learning techniques can improve model performance. These methods help when some categories have much less data than others.

In-Context Learning, In Context + Author Q&As

The Gradient • 11 implied HN points • 29 Apr 23

🕹 Technology Machine Learning

In-context learning involves large language models learning new tasks at inference time with prompts.
Authors Hattie Zhou and Sewon Min share insights on in-context learning in Q&A sessions.
In-context learning helps model better infer concepts learned during pretraining without gradient updates.

Smarter Retrieval, Safer RAG, and Autonomous AI

HackerPulse Dispatch • 2 implied HN points • 07 Feb 25

🕹 Technology Machine Learning

DeepRAG improves how AI retrieves information, making it 22% more accurate than old methods. It helps AI decide when to use outside knowledge and when to rely on what it already knows.
Heima's new idea, hidden thinking, speeds up AI reasoning without losing clarity. It helps the AI think more efficiently by using compact representations of its thought process.
SafeRAG looks at the security of AI systems that use retrieval methods. It finds weaknesses that can be attacked, showing that even advanced systems need better protection.

Data Science Weekly - Issue 361

Data Science Weekly Newsletter • 19 implied HN points • 22 Oct 20

🕹 Technology Machine Learning

Modern data infrastructure is becoming crucial for businesses, as they need better ways to analyze data for value. Companies are confused about the best technologies to use.
Many businesses are investing in AI, but few are actually seeing big returns on that investment. About 11% of companies report gaining significant financial benefits from AI.
There are new learning techniques in AI that allow models to learn from very few examples. This could make machine learning more accessible and reduce costs.

Data Science Weekly - Issue 360

Data Science Weekly Newsletter • 19 implied HN points • 15 Oct 20

🕹 Technology Machine Learning

Improving performance on GPUs is crucial for machine learning. It helps speed up both research and development, which leads to better results overall.
BMW is working on ethical guidelines for AI usage. This aims to ensure that as AI evolves, it remains focused on benefiting people.
Data discovery can be a challenge for companies. Facebook built a tool called Nemo to make it easier for engineers to find the information they need quickly.

How I think about LLM prompt engineering

Sparks in the Wind • 8 HN points • 09 Oct 23

🕹 Technology Machine Learning

LLMs are like databases of vector programs
Prompting a LLM is like querying the database
Prompt engineering is crucial to find the best program

Data Science Weekly - Issue 359

Data Science Weekly Newsletter • 19 implied HN points • 08 Oct 20

🕹 Technology Machine Learning

Arduino is making machine learning easier for everyone by integrating TensorFlow Lite, which lets people run neural networks on Arduino boards to understand simple voice commands.
Papers with Code is now working with arXiv to connect research papers to related code, making it easier for people to see how studies are applied in practice.
Research shows that machine learning models can help automate tasks like counting craters on Mars, which saves human researchers time and effort, allowing them to focus on more complex questions.

Data Science Weekly - Issue 358

Data Science Weekly Newsletter • 19 implied HN points • 01 Oct 20

🕹 Technology Machine Learning

Data quality is very important for machine learning (ML) operations. It helps ensure that ML systems produce reliable results and builds trust with stakeholders.
The State of AI Report highlights recent developments in AI, focusing on research breakthroughs, talent supply, industry applications, and future predictions.
Diversity in AI and supporting applied statistics students are crucial for improving representation and effectiveness in data science and machine learning fields.

"AI Fans" Have No Imagination

Load-bearing Tomato • 12 implied HN points • 16 Feb 23

🕹 Technology Machine Learning

The popular AI art generators succeed because they cater to people's self-interest.
Claiming AI is the future of game development is flawed; AI lacks the understanding required for complex tasks like concept art.
Developers are already effectively using AI technology in areas like animation to enhance games.

Data Science Weekly - Issue 357

Data Science Weekly Newsletter • 19 implied HN points • 24 Sep 20

🕹 Technology Machine Learning

Good communication techniques are key for data and engineering teams to solve technical problems effectively. By improving how they express ideas, teams can reach better solutions faster.
Competitions like the C3.ai COVID-19 Grand Challenge encourage teams to use data science for social good. It's a great chance to make a positive impact during tough times by tackling significant challenges like the pandemic.
New tools like TensorFlow Recommenders make it easier for people to build and serve recommendation models. These tools help users get personalized suggestions for things like movies and restaurants quickly.

Gradient Flow #11: Dark Data, AI Talent, and Reinforcement Learning

Gradient Flow • 19 implied HN points • 18 Jun 20

🕹 Technology Machine Learning

The newsletter covers topics like Dark Data, AI Talent, and Reinforcement Learning.
It discusses important research and applications in machine learning and data science.
There is a focus on virtual conferences, work and hiring trends, and book and project recommendations.

Data Science Weekly - Issue 356

Data Science Weekly Newsletter • 19 implied HN points • 17 Sep 20

🕹 Technology Machine Learning

ICML is an important conference for those in machine learning, catering to various professionals like researchers and engineers. It's a great place to learn and share knowledge about advancements in the field.
NumPy is a key tool for scientific programming in Python, helping organize and analyze data efficiently. It's widely used and supports various other libraries for data science tasks.
The emergence of generative AI technology is changing the entertainment industry rapidly. Soon, creating movies or shows could be done at a fraction of today's production costs.

Data Science Weekly - Issue 355

Data Science Weekly Newsletter • 19 implied HN points • 10 Sep 20

🕹 Technology Machine Learning

DeepMind and Google Maps are using advanced Graph Neural Networks to improve the accuracy of travel time predictions, making them even more reliable in cities around the world.
AI is now being used to detect deepfake videos by identifying unique signals from the videos, which can help spot how they were made.
There are resources available to help people get started in data science, build their portfolios, and improve their resumes to land jobs in this field.

Gradient Flow: ML in Finance, Disinformation, AI Superpowers

Gradient Flow • 19 implied HN points • 04 Jun 20

🕹 Technology Machine Learning

Collaboration between lawyers and technologists is crucial for identifying and mitigating risks associated with AI deployment in various industries.
Responsible ML tools from Microsoft focus on explainability, privacy & security, and governance & reproducibility, providing comprehensive support for ethical AI development.
China and the US are considered AI superpowers, with strong research interest in Data and AI, along with vibrant startup ecosystems focused on applying these technologies.

Smarter Agents, Self-Aware LLMs, and Knowledge from Videos

HackerPulse Dispatch • 2 implied HN points • 24 Jan 25

🕹 Technology Machine Learning

New techniques can shrink the size of data storage without losing accuracy, which helps in finding information faster.
Language models are getting better at learning from their own mistakes, making them smarter and more self-aware.
AI can now learn complex skills just by watching videos, which shows that reading text isn't always necessary for advanced learning.

Data Science Weekly - Issue 354

Data Science Weekly Newsletter • 19 implied HN points • 03 Sep 20

🕹 Technology Machine Learning

A machine learning algorithm recently helped discover 50 new planets from old NASA data, showing how AI can unlock new discoveries.
There has been a noticeable drop in deep learning job postings in the past six months, revealing that many companies are reassessing the importance of this technology.
Apple has introduced a residency program for AI and machine learning, offering training and hands-on experience for those with relevant backgrounds.

I made a web app to get better at adding half-steps to notes

Excited Technology Rambles • 1 HN point • 27 Dec 23

🕹 Technology Machine Learning

The web app was created to practice adding half-steps to notes.
The goals included making the app usable on both computers and phones, focusing on fast iteration speed, and keeping the design visually appealing.
Using tools like ChatGPT helped enhance the design and user experience of the app.

Data Science Weekly - Issue 353

Data Science Weekly Newsletter • 19 implied HN points • 27 Aug 20

🕹 Technology Machine Learning

Effective testing is crucial for machine learning systems. It's important to understand that these systems require different testing strategies compared to traditional software.
There are hidden challenges in becoming a machine learning engineer. Many of these insights come from the experiences of those already in the field, beyond what you learn in books.
New resources and courses are constantly being developed in data science. For example, fast.ai just released a new deep learning course and libraries, which can help beginners get started.

Data Science Weekly - Issue 352

Data Science Weekly Newsletter • 19 implied HN points • 20 Aug 20

🕹 Technology Machine Learning

minGPT is a smaller version of the GPT model that aims to be simple and easy to understand. It’s only about 300 lines of code, which makes it a good resource for learning.
Biased training data, like the CoNLL-2003 dataset, can lead AI models to perform poorly on diverse names and future data. This can cause ongoing issues with how these models recognize different groups.
Reinforcement learning has challenges in real-world applications due to assumptions that often don't hold up. Researchers need to address these challenges to make RL more practical and effective.

Data Science Weekly - Issue 351

Data Science Weekly Newsletter • 19 implied HN points • 13 Aug 20

🕹 Technology Machine Learning

Machine learning models need regular maintenance after deployment. It's important to monitor data and model behavior to avoid problems and improve performance.
Collaboration and good understanding of problems are key in AI development. This helps teams create better applications and make profits.
New tools and resources are becoming available for data science, like access to research papers on Kaggle. These can help improve machine learning techniques and open up new possibilities.

Deep Learning Platform, TinyML, Privacy ↔ Contact Tracing

Gradient Flow • 19 implied HN points • 07 May 20

🕹 Technology Machine Learning

Deep learning models are being implemented in tiny devices with tools like TinyML for ultra-low-power systems.
Distributed training for deep learning models is made simpler and cheaper with libraries like RaySGD.
Technology like facial recognition for contact tracing can also raise concerns about privacy and mass surveillance.

Data Science Weekly - Issue 350

Data Science Weekly Newsletter • 19 implied HN points • 06 Aug 20

🕹 Technology Machine Learning

Language models like GPT-3 can do amazing things, such as creating human-like text and writing code, but there's still curiosity about their ability to make analogies.
Data science is increasingly being applied to many fields, like health through biomedical NLP or analyzing complex problems with graph technologies.
As companies build their data tools, there’s a trend toward developing unique solutions tailored to their specific needs, highlighting the importance of data discovery.

Making Peace with LLM Non-determinism

The Finest Tuners • 5 HN points • 07 Apr 24

🕹 Technology Machine Learning

Non-determinism in language models can be frustrating because you can't always expect the same output each time you input the same prompt. This unpredictability often stems from the way language itself works.
You can reduce some of this unpredictability by using techniques like seeding and selecting better models. These methods help control how outputs are generated and make them more consistent.
Understanding that language is inherently complex can help you see the random outputs as part of the model's nature, not just flaws. Embracing this chaos can lead to surprising and interesting results.

Nibble #14

The Nibble • 9 implied HN points • 30 Jun 23

🕹 Technology Machine Learning

Wimbledon is introducing AI-powered commentary for highlight clips.
Google is working on AI-powered dubbing for YouTube videos.
GitHub released a tool for analyzing and setting permissions for actions.

Post-Transformers - Hyena Hierarchy

Why Now • 8 implied HN points • 04 Sep 23

🕹 Technology Machine Learning

Hyena clans have a linear dominance hierarchy with one-to-one chain of command
LLMs like Transformers face challenges with attention mechanisms due to scaling limitations
Hyena proposes a sub-quadratic solution to attention via long-convolutions and data-controlled gating