The hottest Interpretability Substack posts right now

And their main takeaways

The Sequence Engineering #556: Inside Anthropic's New Open Source AI Interpretability Tools

TheSequence • 49 implied HN points • 04 Jun 25

🕹 Technology AI Interpretability Open Source Research Development

Anthropic is becoming a leader in AI interpretability, which helps explain how AI systems make decisions. This is important for understanding and trusting AI outputs.
They have developed new tools for tracing the thought processes of language models, helping researchers see how these models work internally. This makes it easier to improve and debug AI systems.
Anthropic's recent open source release of circuit tracing tools is a significant advancement in AI interpretability, providing valuable resources for researchers in the field.

How to make use of inductive biases

Mindful Modeler • 219 implied HN points • 04 Jun 24

🔬 Science Machine Learning Interpretability Forecasting Modeling

Inductive biases play a crucial role in model robustness, interpretability, and leveraging domain knowledge.
Choosing inherently interpretable models can enhance model understandability by restricting the hypothesis space of the learning algorithm.
By selecting inductive biases that reflect the data-generating process, models can better align with reality and improve performance.

Book Launch: ML for Science 🐦‍⬛

Mindful Modeler • 499 implied HN points • 06 Feb 24

🔬 Science Machine Learning Philosophy of science Interpretability Causality Domain Knowledge

The book discusses the justification and strengths of using machine learning in science, emphasizing prediction and adaptation to data
Machine learning lacks inherent transparency and causal understanding, but tools like interpretability and causality modeling can enhance its utility in research
The book is released chapter by chapter for free online, covering topics such as domain knowledge, interpretability, and causality

SHAP Is Not All You Need

Mindful Modeler • 898 implied HN points • 07 Feb 23

🕹 Technology Machine Learning Interpretability Research Books Critique

It's important to avoid assuming one method is always the best for all interpretation contexts when working with machine learning interpretability tools like SHAP.
Different interpretability methods like SHAP and permutation feature importance (PFI) have unique goals and can provide different insights, so it's crucial to choose the method that aligns with the specific question you want to answer.
Research on interpretability should be more driven by questions rather than methods, to ensure that the tools used provide meaningful insights based on the context.

Use interpretability to improve and debug your ML model

Mindful Modeler • 279 implied HN points • 05 Dec 23

🕹 Technology Machine Learning Interpretability Debugging

Identify target leakage using feature importance to prevent accidental data pre-processing errors that leak target information into features.
Debug your model by utilizing ML interpretability to spot errors in feature coding, such as incorrect signs on feature effects.
Gain insights for feature engineering by understanding important features, and know which ones to focus on for creating new informative features.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

A new chapter on generalization

Mindful Modeler • 99 implied HN points • 16 Apr 24

🔬 Science Machine Learning Generalization Statistics Interpretability Data Analysis

Many COVID-19 classification models based on X-ray images during the pandemic were found to be ineffective due to various issues like overfitting and bias.
Generalization in machine learning goes beyond just low test errors and involves understanding real-world complexities and data-generating processes.
Generalization of insights from machine learning models to real-world phenomena and populations is a challenging process that requires careful consideration and assumptions.

Machine learning interpretability from first principles

Mindful Modeler • 359 implied HN points • 26 Sep 23

🕹 Technology Machine Learning Interpretability Models Methods

Machine learning models can be understood as mathematical functions that can be broken down into simpler parts
Interpretation methods address the behavior of these simplified components to enhance model interpretability
Techniques like Permutation Feature Importance (PFI), SHAP values, and Accumulated Local Effect Plots use decomposition to explain the importance of features in prediction models

A short history of SHAP

Mindful Modeler • 359 implied HN points • 30 May 23

🕹 Technology Machine Learning Explainable AI AI Ethics Research Interpretability

Shapley values originated in game theory in 1953 and contributed to fair resource distribution methods.
In 2010, Shapley values were introduced to explain machine learning predictions, but didn't gain traction until the SHAP method in 2017.
SHAP gained popularity for its new estimator for Shapley values, unification of existing methods, and efficient computation, leading to widespread adoption in machine learning interpretation.

The Case for Uninterpretable Machine Learning

Mindful Modeler • 319 implied HN points • 03 Oct 23

🕹 Technology Machine Learning Complexity Interpretability Flexibility Neural Networks

Machine learning excels because it's not interpretable, not in spite of it.
Embracing complexity in models like neural networks can effectively capture the intricacies of real-world tasks that lack simple rules or semantics.
Interpretable models can outperform complex ones with smaller datasets and ease of debugging, but being open to complex models can lead to better performance.

Don't be dogmatic about interpretability-by-design versus post-hoc

Mindful Modeler • 239 implied HN points • 28 Nov 23

🕹 Technology Machine Learning Interpretability Philosophy Methodology Research

Machine learning models can be made interpretable by design or interpretable post-hoc
When choosing an interpretation approach, consider your specific goals
Interpretability can serve purposes like model debugging, justification in high-stakes scenarios, and extracting insights from the model

The Sequence Chat: Why are Foundation Models so Hard to Explain and What are we Doing About it?

TheSequence • 77 implied HN points • 27 Nov 24

🕹 Technology AI Models Machine Learning Data science Interpretability Natural Language

Foundation models are really complex and hard to understand. They act like black boxes, which makes it tough to know how they make decisions.
Unlike older machine learning models, these large models have much more advanced capabilities but also come with bigger interpretability challenges.
New fields like mechanistic interpretability and behavioral probing are trying to help us figure out how these complex models work.

Should we stop interpreting ML models because XAI methods are imperfect?

Mindful Modeler • 199 implied HN points • 31 Oct 23

🕹 Technology Machine Learning Interpretability Neural Networks Modeling

Don't let a pursuit of perfection in interpreting ML models hinder progress. It's important to be pragmatic and make decisions even in the face of imperfect methods.
Consider the balance of benefits and risks when interpreting ML models. Imperfect methods can still provide valuable insights despite their limitations.
While aiming for improvements in interpretability methods, it's practical to use the existing imperfect methods that offer a net benefit in practice.

Interpreting Machine Learning Models With SHAP is published 🥳

Mindful Modeler • 199 implied HN points • 01 Aug 23

🕹 Technology Machine Learning Interpretability Python Books Podcasts

SHAP can explain individual predictions and provide interpretations of average model behavior for any model type and data format.
There's a need for a comprehensive guide like the book to navigate the evolving SHAP ecosystem with updated information and practical examples.
The book dives into the theory, application, and various estimation methods of SHAP values, offering a one-stop resource for mastering machine learning model interpretability.

Feature Selection And Feature Importance: How Are They Related?

Mindful Modeler • 299 implied HN points • 28 Feb 23

🕹 Technology Machine Learning Interpretability

Feature selection and feature importance are different steps in modeling with different goals, but they are complementary. Getting feature selection right can enhance interpretability.
Feature selection aims to reduce the number of features used in the model to improve predictive performance, speed up training, enhance comprehensibility, and reduce costs.
Feature importance involves ranking and quantifying the contribution of features to model predictions, aiding in understanding model behavior, auditing, debugging, feature engineering, and comprehending the modeled phenomenon.

Can you explain GPT with ... GPT?

Mindful Modeler • 199 implied HN points • 16 May 23

🕹 Technology Neural Networks Interpretability Modeling Language Models AI Ethics

OpenAI experimented with using GPT-4 to interpret the functionality of neurons in GPT-2, showcasing a unique approach to understanding neural networks.
The process involved analyzing activations for various input texts, selecting specific texts to explain neuron activations, and evaluating the accuracy of these explanations.
Interpreting complex models like LLMs with other complex models, such as using GPT-4 to understand GPT-2, presents challenges but offers a method to evaluate and improve interpretability.

From bare-bones to holistic machine learning

Mindful Modeler • 159 implied HN points • 08 Aug 23

🕹 Technology Machine Learning Modeling Interpretability Data Tools

Machine learning can range from simple, bare-bones tasks to more complex, holistic approaches.
In bare-bones machine learning, the modeling choices are defined, making it about the model's performance and tuning.
Holistic machine learning involves designing the model to connect with the larger context, considering factors like uncertainty, interpretability, and shifts in distribution.

The Glossary of Human-Centered AI

Niloufar’s Substack • 137 implied HN points • 03 May 23

🕹 Technology AI Machine Learning Ethics Interpretability Uncertainty

This post explains key terms in Human-Centered AI, including HCAI concepts, Ethics, and Machine Learning.
Understanding and managing uncertainty is crucial in AI models for performance and reliability.
Explainability methods aim to make AI models transparent, interpretable, and understandable for humans.

Why you (probably) shouldn't use LIME to explain model predictions

Mindful Modeler • 159 implied HN points • 28 Mar 23

🕹 Technology Machine Learning Interpretability

Local Interpretable Model-Agnostic Explanations (LIME) can be challenging to use effectively due to the difficulty in defining the 'local' neighborhood.
The choice of kernel width in LIME is critical for the accuracy of the explanations, but it can be unclear how to select the appropriate width for different datasets and applications.
There are alternative methods like Shapley values, counterfactual explanations, and what-if analysis that offer interpretability without the need to specify a neighborhood, making them potentially more suitable than LIME for certain cases.

Interpret Complex Pipelines By Drawing A Box

Mindful Modeler • 159 implied HN points • 22 Nov 22

🕹 Technology Data Analysis Interpretability

Interpretation of complex pipelines can be challenging when model changes impact interpretability. Use model-agnostic interpretation methods to interpret arbitrary pipelines.
Think of predictive models as pipelines with various steps like transformations and model ensembles. View the entire pipeline as the model for better interpretation.
Draw the box around the entire pipeline in model-agnostic interpretation to gain insights into feature importance, prediction changes, and explanations, disregarding the specific models within the pipeline.

Approaches to uncertainty, causality and interpretability with supervised learning

Mindful Modeler • 159 implied HN points • 04 Oct 22

🕹 Technology Causal Inference Interpretability

Supervised learning can go beyond prediction to offer uncertainty quantification, causal effect estimation, and interpretability using model-agnostic tools.
Uncertainty quantification with conformal prediction can turn 'weak' uncertainty scores into rigorous prediction intervals for machine learning models.
Causal effect estimation with double machine learning allows for correction of biases in causal effect estimation through supervised machine learning.

Don’t settle for a superficial understanding of how AI chatbots work

Skybrian’s Blog • 58 implied HN points • 27 Mar 23

🕹 Technology AI Chatbots Research Interpretability Machine Learning

Don't settle for a superficial understanding of AI chatbots.
Real insight on AI chatbots will require research, not just casual use.
Debating whether chatbots have 'world models' is important to understanding how they work.

What's Going on Under the Hood of LLMs

The End of Reckoning • 19 implied HN points • 21 Feb 23

🔬 Science Artificial Intelligence Cognitive Science Neural Networks Interpretability Philosophy

Transformer models, like LLMs, are often considered black boxes, but recent work is shedding light on the internal processes and interpretability of these models.
Induction heads in transformer models help with in-context learning and the ability to predict information based on the sequence of tokens seen before.
By analyzing hidden states and conducting memory-based experiments, researchers are beginning to understand how transformer models store and manipulate information, providing insights into how these models may represent truth internally.

Monosemanticity at Home: My Attempt at Replicating Anthropic's Interpretability Research from Scratch

Jake Ward's Blog • 2 HN points • 30 Apr 24

🕹 Technology AI Data science Machine Learning Research Interpretability

Large language models like ChatGPT have complex, learned logic that is difficult to interpret due to 'superposition' - where single neurons correspond to multiple functions.
Techniques like sparse dictionary learning can decompose artificial neurons into 'features' that exhibit 'monosemanticity', making the models more interpretable.
Reproducing research on model interpretability shows promise for breakthroughs and indicates a shift towards engineering challenges over scientific barriers.

Explainable AI : Black Box of Artificial Intelligence

Product Mindset's Newsletter • 5 implied HN points • 10 Mar 24

🕹 Technology AI Explainable AI Interpretability

Explainable AI (XAI) helps provide transparency in AI models so users can understand the logic behind predictions.
Understanding how AI decisions are made is crucial for accountability, identifying biases, and improving model performance.
Principles of Explainable AI include transparency in outputs, user-centric design, accurate explanations, and awareness of system limitations.

Why Do A.I. Image Generators Have Problems Creating Hands?

I'll Keep This Short • 5 implied HN points • 14 Aug 23

🕹 Technology AI Neural Networks Image Generation Machine Learning Interpretability

A.I. image generators struggle with creating hands due to the complexity of hand shapes and poses
Neural networks power image generators through mathematical transforms
Efforts are being made to improve A.I. image generation by addressing challenges like hand creation through interpretability of neural networks

Three thoughts on turning policy calls for ‘interpretable’ AI systems into action

buffering... • 0 implied HN points • 09 Aug 23

🕹 Technology AI Policy Research Interpretability Regulation

The algorithms in deep learning systems are mostly unknown, making it challenging to assess their learning process and how they generate output.
Firms like Anthropic are investing in making AI algorithms more interpretable, but more support is needed.
To promote the development of interpretable AI systems, measures like grants, collaboration across disciplines, and improving existing techniques are crucial.

Sparse Dictionary Learning and Transformer Interpretability

buffering... • 0 implied HN points • 15 Aug 23

🕹 Technology Interpretability Language Models

Sparse dictionary learning helps in managing ambiguity in representing input data.
Applying sparse dictionary learning to language models like BERT and GPT reveals tiers of semantic structures.
Future research could explore improving data curation methods and adapt the technique for larger models like GPT.

May 2023 safety news: Emergence, Activation engineering, GPT-4 explains GPT-2 neurons

AI safety takes • 0 implied HN points • 02 Jun 23

🕹 Technology AI Interpretability Ethics Security Research

Activation engineering is a new method to steer LLMs during inference without weight modifications.
Choice of metric significantly influences how suddenly an ability seems to appear in large language models.
Bilinear layers offer better interpretability than nonlinear activations in MLPs.

AI alignment as a translation problem

Engineering Ideas • 0 implied HN points • 05 Feb 24

🕹 Technology AI Interpretability Cognitive Science

AI alignment can be viewed as a translation problem between different learning agents.
The AI-human translation problem is more about humans learning from AIs than vice versa.
Equipping AIs with human inductive biases may be crucial for developing more 'natural' AI.