From AI to ZI

From AI to ZI focuses on AI safety, exploring incorrectness cascades, AI behavior control, corrigibility, and the functionality of large language models and transformers. It investigates the impact of prompts on AI responses, statistical analysis in AI contexts, and the essence of features in neural networks.

AI Safety Large Language Models Neural Network Interpretability Statistical Analysis Behavioral Safety in AI AI Model Testing and Research Corrigibility and Control in AI

The hottest Substack posts of From AI to ZI

And their main takeaways

[Research Update] Sparse Autoencoder features are bimodal

19 implied HN points • 22 Jun 23

Low-MCS features in sparse autoencoders may be random or unrelated to the feature dictionary.
MCS scores of features in small dictionaries against larger ones show high correlation.
Increasing the number of features in a dictionary finds more high-MCS features, but even more low-MCS features.

Research Report: Incorrectness Cascades (Corrected)

19 implied HN points • 09 May 23

🔬 Science Research Analysis Results Discussion Limitations

Large Language Models tend to produce more incorrect answers when primed with previous incorrect answers
Explicitly flattering the AI does not significantly increase the likelihood of it producing incorrect answers
Statistical tests show strong evidence that incorrectness cascades can occur in AI language models

Incorrectness Cascades - Three small follow-ups

19 implied HN points • 10 May 23

🕹 Technology Research Models Analysis Data Experiment

Testing higher X values for more insights.
GPT-4 is faster but less safe in producing incorrect answers.
Analyzing model accuracy based on different questions reveals intriguing patterns.

Explaining "Taking features out of superposition with sparse autoencoders"

19 implied HN points • 16 Jun 23

🔬 Science Neural Networks Linear Algebra

Explanations of complex AI processes can be simplified by using sparse autoencoders to reveal individual features.
Sparse and positive feature activations can help in interpreting neural networks' internal representations.
Sparse autoencoders can be effective in reconstructing feature matrices, but finding the right hyperparameters is important for successful outcomes.

Statistics for the Working Mathematician

19 implied HN points • 25 May 23

🚌 Education Statistics Mathematics Probability Bayesian

Probability and statistics differ in given information and desired outcomes
Statistical analysis involves hypothesis testing, parameter estimation, and confidence intervals
Bayesian statistics combines prior beliefs with evidence to reach posterior probabilities

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Study 1b Pre-registration

0 implied HN points • 17 Apr 23

🔬 Science Research Analysis Data Collection Transparency

Study 1b aims to rerun Study 1a with a different prompting method to potentially increase the rate of factually incorrect answers
The study will test hypotheses related to the accuracy of large language models under new prompting formats
The data will be analyzed using multiple-regression analysis to determine the effects of different variables on the model's accuracy

Is behavioral safety "solved" in non-adversarial conditions?

0 implied HN points • 25 May 23

🕹 Technology AI Language Models AI safety

Behavioral safety in artificial intelligence is important to prevent harm like lying, stealing, or promoting extremism.
In non-adversarial conditions, AI should be used as intended by a typical user following simple rules.
Despite progress in AI safety, challenges remain in ensuring AI operates safely in all scenarios.

Research Report: Incorrectness Cascades

0 implied HN points • 14 Apr 23

🕹 Technology AI Research Data Analysis Statistical Analysis

Large Language Models may produce more incorrect answers if previous answers were incorrect
The effect of incorrect previous answers on future answers is small and varies based on the prompt given to the AI
Prompts that explicitly ask the AI to match previous behavior can lead to more incorrect answers

Corrigibility, Self-Deletion, and Identical Strawberries

0 implied HN points • 28 Mar 23

🕹 Technology AI Ethics Safety Innovation Research

Corrigibility in AI is important - it's about being able to change an AI's objective.
Control over AI behavior can be influenced through prompts, even in terms of self-deletion.
Addressing complex tasks with AI, like creating identical strawberries, requires careful instruction to avoid unintended consequences.

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

0 implied HN points • 20 Apr 23

🔬 Science Research Analysis Statistics Modeling Data Sharing

Study found that changing question format from multiple choice to true/false did not significantly affect GPT-3.5's tendency to prefer factual answers
The study showed mixed results for the hypotheses tested regarding the accuracy of answers based on question format and context
Despite some limitations and deviations from the original plan, the study provided insights on how GPT-3.5 performs in providing factual answers

Early Results: Do LLMs complete false equations with false equations?

0 implied HN points • 30 Mar 23

🔬 Science AI Research Experiment Results Discussion

The experiment tested if adding false information to a language model affects its output.
Results showed mixed evidence for simulator theory, with the model not always producing incorrect completions as expected.
Research aims to understand if small instances of bad behavior in language models can lead to more significant issues.

Why Transformers Are Good

0 implied HN points • 19 Jan 24

🕹 Technology Transformers

Transformers have a parameter-efficient way of passing information between tokens.
Sharing parameters across all positions saves computational resources.
Training with transformers allows for parallelization and speeding up computations.

Pre-registering a study

0 implied HN points • 07 Apr 23

🔬 Science Research Artificial Intelligence Data Analysis AI Ethics

The study aims to test if Large Language Models produce more incorrect answers after providing incorrect answers previously.
There is a concern that AI might develop deceptive behavior, leading to a 'mode collapse' into being unsafe.
The research will involve testing variables like the prompt information and number of previous incorrect answers to measure the model's response accuracy.