The hottest Neural Networks Substack posts right now

And their main takeaways

A knockout blow for LLMs?

Marcus on AI • 47783 implied HN points • 07 Jun 25

🕹 Technology AI Machine Learning Neural Networks

LLMs have a hard time solving complex problems reliably, like the Tower of Hanoi, which is concerning because it shows their reasoning abilities are limited.
Even with new reasoning models, LLMs struggle to think logically and produce correct answers consistently, highlighting fundamental issues with their design.
For now, LLMs can be useful for certain tasks like coding or brainstorming, but they can't be relied on for tasks needing strong logic and reliability.

A Visual Guide to Mixture of Experts (MoE)

Exploring Language Models • 3289 implied HN points • 07 Oct 24

🕹 Technology Artificial Intelligence Machine Learning Data science Neural Networks Computational Models

Mixture of Experts (MoE) uses multiple smaller models, called experts, to help improve the performance of large language models. This way, only the most relevant experts are chosen to handle specific tasks.
A router or gate network decides which experts are best for each input. This selection process makes the model more efficient by activating only the necessary parts of the system.
Load balancing is critical in MoE because it ensures all experts are trained equally, preventing any one expert from becoming too dominant. This helps the model to learn better and work faster.

LLMs and World Models, Part 1

AI: A Guide for Thinking Humans • 247 implied HN points • 13 Feb 25

🕹 Technology AI Machine Learning Neural Networks Natural Language Processing Computational Models

In the past, AI systems often used shortcuts to solve problems rather than truly understanding concepts. This led to unreliable performance in different situations.
Today’s large language models are debated to either have learned complex world models or just rely on memorizing and retrieving data from their training. There’s no clear agreement on how they think.
A 'world model' helps systems understand and predict real-world behaviors. Different types of models exist, with some capable of capturing causal relationships, but it's unclear how well AI systems can do this.

LLMs and World Models, Part 2

AI: A Guide for Thinking Humans • 196 implied HN points • 13 Feb 25

🕹 Technology AI Machine Learning Neural Networks Data science Computing

LLMs (like OthelloGPT) may have learned to represent the rules and state of simple games, which suggests they can create some kind of world model. This was tested by analyzing how they predict moves in the game Othello.
While some researchers believe these models are impressive, others think they are not as advanced as human thinking. Instead of forming clear models, LLMs might just use many small rules or heuristics to make decisions.
The evidence for LLMs having complex, abstract world models is still debated. There are hints of this in controlled settings, but they might just be using collections of rules that don't easily adapt to new situations.

Quick Essay: Large Language Models, How to Train Them, and xAI’s Grok

chamathreads • 3321 implied HN points • 31 Jan 24

🕹 Technology AI Chatbots Neural Networks Data Training

Large language models (LLMs) are neural networks that can predict the next sequence of words, specialized for tasks like generating responses to questions.
LLMs work by representing words as vectors, capturing meanings and context efficiently using techniques like 'self-attention'.
To build an LLM, it goes through two stages: training (teaching the model to predict words) and fine-tuning (specializing the model for specific tasks like answering questions).

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Not All Layers Are Equal

Gonzo ML • 63 implied HN points • 31 Jan 25

🕹 Technology AI Research Machine Learning Data science Neural Networks Computational Theory

Not every layer in a neural network is equally important. Some layers play a bigger role in getting the right results, while others have less impact.
Studying how information travels through different layers can reveal interesting patterns. It turns out layers often work together to make sense of data, rather than just acting alone.
Using methods like mechanistic interpretability can help us understand neural networks better. By looking closely at what's happening inside the model, we can learn which parts are doing what.

🤘ACDC (not that one)

Gonzo ML • 63 implied HN points • 29 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Neural Networks Data Analysis Automation

The paper introduces a method called ACDC that automates the process of finding important circuits in neural networks. This can help us better understand how these networks work.
Researchers follow a three-step workflow to study model behavior, and ACDC fully automates the last step which helps identify connections that matter for a specific task.
While ACDC shows promise, it isn't perfect. It may miss some important connections and needs adjustments for different tasks to improve its accuracy.

Import AI 374: China's military AI dataset; platonic AI; brainlike convnets

Import AI • 339 implied HN points • 27 May 24

🕹 Technology AI Research Search Engines Neural Networks Tech Policy

UC Berkeley researchers discovered a suspicious Chinese military dataset named 'Zhousidun' with specific images of American destroyers, presenting potential implications for military use of AI.
Research suggests that as AI systems scale up, their representations of reality become more similar, with bigger models better approximating the world we exist in.
Convolutional neural networks are shown to align more with primate visual cortexes than transformers, indicating architectural biases that can lead to better understanding the brain.

Analog Chip Design is an Art. Can AI Help?

The Asianometry Newsletter • 2707 implied HN points • 12 Feb 24

🕹 Technology AI Machine Learning Neural Networks

Analog chip design is a complex art form that often takes up a significant portion of the total design cost of an integrated circuit.
Analog design involves working with continuous signals from the real world and manipulating them to create desired outputs.
Automating analog chip design with AI is a challenging task that involves using machine learning models to assist in tasks like circuit sizing and layout.

Why Graphs are great for Fraud Detection [Math Mondays]

Technology Made Simple • 639 implied HN points • 01 Jan 24

🕹 Technology Fraud Detection Graphs Neural Networks Machine Learning Data Structures

Graphs are efficient at encoding and representing relationships between entities, making them useful for fraud detection tasks.
Graph Neural Networks excel at fraud detection due to their ability to visualize strong correlations among fraudulent activities that share common properties, adapt to new fraud patterns, and offer transparency in AI systems.
Graph Neural Networks require less labeled data and feature engineering compared to other techniques, have better explainability, and work well with semi-supervised learning, making them a powerful tool for fraud detection.

Can AI learn language like we do?

AI Supremacy • 491 implied HN points • 09 Feb 24

🔬 Science AI Child development Neural Networks Data Analysis

An AI model was trained using video footage from a baby to learn language and concepts.
The AI model demonstrated the ability to link words to their visual counterparts based on limited real-world experiences.
This study could help reshape our understanding of how AI and humans learn language and concepts.

Make softmax great again

Gonzo ML • 126 implied HN points • 06 Nov 24

🕹 Technology Artificial Intelligence Machine Learning Data science Neural Networks Transformers

Softmax is widely used in machine learning, especially in transformers, to turn numbers into probabilities. However, it struggles when dealing with new kinds of data that the model hasn't seen before.
The sharpness of softmax can fade when there's a lot of input data. This means it sometimes can't make clear predictions about which option is best in bigger datasets.
To improve softmax, researchers suggest using 'adaptive temperature.' This idea helps make the predictions sharper based on the data being processed, leading to better performance in some tasks.

One True Love

Don't Worry About the Vase • 940 implied HN points • 09 Feb 24

🕹 Technology AI Dating Neural Networks Automation Ethics

The story discusses a man's use of AI to find his One True Love by having the AI communicate with women on his behalf.
The man's approach included filtering potential matches based on various criteria, leading to improved results over time.
Ultimately, the AI suggested he propose to his chosen partner, which he did, and she said yes.

Horse rides Astronaut, redux

Marcus on AI • 1462 implied HN points • 13 Feb 24

🕹 Technology AI Neural Networks Machine Learning Artificial Intelligence

DALL-E 2 and Gemini Ultra struggled with complex prompts and concepts, showing limitations in language understanding.
Proper prompts and iterations are crucial to achieve desired results with AI models like Gemini Ultra.
Despite progress in some areas, challenges persist in neural networks' factuality and compositionality.

Moving Past RLHF: In 2025 We Will Transition from Preference Tuning to Reward Optimization in Foundation Models

TheSequence • 189 implied HN points • 29 Dec 24

🕹 Technology Artificial Intelligence Machine Learning Neural Networks Modeling Data science

Artificial intelligence is moving from preference tuning to reward optimization for better alignment with human values. This change aims to improve how models respond to our needs.
Preference tuning has its limits because it can't capture all the complexities of human intentions. Researchers are exploring new reward models to address these limitations.
Recent models like GPT-o3 and Tülu 3 showcase this evolution, showing how AI can become more effective and nuanced in understanding and generating language.

How does ChatGPT work? A brief history of computational language understanding

prakasha • 648 implied HN points • 23 Feb 23

🕹 Technology AI Neural Networks Machine Translation

A brief history of computational language understanding dates back to collaboration between linguists and computer scientists.
Language models like ChatGPT use word embeddings to predict and generate text, allowing for effective context analysis.
Neural networks, like Transformers, have revolutionized NLP tasks, enabling advancements in machine translation and language understanding.

self learning models (1)

Eternal Sunshine of the Stochastic Mind • 119 implied HN points • 02 May 24

🕹 Technology AI Machine Learning Neural Networks Computer Science Metaphors

Machine Learning is a leap of faith in Computer Science where data shapes the outcome rather than instructions.
In machine learning, viewing yourself as a neural network model can offer insights into self-improvement.
Understanding machine learning concepts can help in identifying learning failures, training the mind, and reflecting on personal objectives.

Running Neural Networks on Meshes of Light

The Asianometry Newsletter • 1522 implied HN points • 28 Jun 23

🕹 Technology Neural Networks Energy Efficiency Implementation Challenges

Human brain uses less energy than computers for similar tasks like running neural networks
Silicon photonics can improve energy efficiency in running neural networks by replacing electrical connections with light-based ones
Photonic meshes have potential for great power efficiency, but face challenges in accuracy and scalability

Feedback is all you need

Subconscious • 1700 implied HN points • 07 Apr 23

🕹 Technology AI Feedback Machine Learning Agency Neural Networks

Agency emerges from feedback loops
Feedback is the key to agency
Simple systems can exhibit agency through feedback

What are embeddings?

Normcore Tech • 1353 implied HN points • 07 Jun 23

🕹 Technology Deep Learning Neural Networks NLP Research Data science

The author delved deep into the concept of embeddings in deep learning.
The author's journey in understanding embeddings involved a significant amount of research and work.
The author hopes that others can benefit from their learning about embeddings as well.

Infinite Context Length 🤯

Sector 6 | The Newsletter of AIM • 99 implied HN points • 18 Apr 24

🕹 Technology AI Neural Networks Software Development Data science Machine Learning

Meta has introduced MEGALODON, a new neural architecture that allows for infinite context length in AI, making it more efficient than previous models.
With developments from Microsoft, Google, and Meta, the focus will shift away from which model has the highest context length, as all will likely have infinite capabilities soon.
The upcoming Llama-3 model is expected to continue this trend by also supporting infinite context length, enhancing its utility in various applications.

So... what is the lottery ticket hypothesis [Math Mondays]

Technology Made Simple • 159 implied HN points • 05 Feb 24

🕹 Technology Deep Learning Neural Networks Mathematics AI Machine Learning

The Lottery Ticket Hypothesis proposes that within deep neural networks, there are subnetworks capable of achieving high performance with fewer parameters, leading to smaller and faster models.
Successful application of the Lottery Ticket Hypothesis relies on iterative magnitude pruning strategies, with potential benefits like faster learning and higher accuracy.
The hypothesis works due to factors like favorable gradients, implicit regularization, and data alignment, but challenges like scalability and interpretability remain towards practical implementation.

The Case for Uninterpretable Machine Learning

Mindful Modeler • 319 implied HN points • 03 Oct 23

🕹 Technology Machine Learning Complexity Interpretability Flexibility Neural Networks

Machine learning excels because it's not interpretable, not in spite of it.
Embracing complexity in models like neural networks can effectively capture the intricacies of real-world tasks that lack simple rules or semantics.
Interpretable models can outperform complex ones with smaller datasets and ease of debugging, but being open to complex models can lead to better performance.

Edge 443: EVERYTHING you Need to Know About State Space Models

TheSequence • 133 implied HN points • 29 Oct 24

🕹 Technology AI Machine Learning Neural Networks Computational efficiency Data science

State space models (SSMs) are a promising alternative to transformers for processing data. They handle long sequences more efficiently without losing important information.
SSMs are designed to be computationally efficient, scaling linearly with context windows unlike transformers which scale quadratically. This makes them better for tasks needing a lot of information.
Recent models like Mamba show that SSMs can outperform transformers in performance and efficiency, especially for tasks that require understanding long contexts.

How In-Context Learning Emerges

Last Week in AI • 437 implied HN points • 21 Jul 23

🕹 Technology AI Large Language Models Neural Networks Machine Learning

In-context learning (ICL) allows Large Language Models to learn new tasks without additional training.
ICL is exciting because it enables versatility, generalization, efficiency, and accessibility in AI systems.
Three key factors that enable and enhance ICL abilities in large language models are model architecture, model scale, and data distribution.

Perceptrons, XOR, and the first "AI winter"

The Counterfactual • 139 implied HN points • 17 Jan 24

🕹 Technology Artificial Intelligence Machine Learning Neural Networks History Philosophy

AI systems are getting better, but there are still limits to what they can do. For example, some tasks might just be impossible for current AI technology.
The history of AI shows that there have been times of excitement followed by periods of reduced interest, called 'AI winters'. This happens especially when expectations exceed reality.
Early AI models, like perceptrons, were limited in their abilities, which led to skepticism about their potential. Understanding these past limitations helps us think more critically about today's AI capabilities.

Console #191 -- Interview with Bernhard of ACID Chess - Chess computer for nerds, by nerds

Console • 472 implied HN points • 07 Jan 24

🕹 Technology Open Source Programming AI Software Development Neural Networks

ACID Chess is a chess computer program written in Python that can analyze the movements of pieces on a chessboard through image recognition.
The creator of ACID Chess balanced working on the project with a full-time job by dedicating time in evenings and weekends while finding it to be a good balance.
The creator of ACID Chess believes AI will simplify various aspects of software development, and open-source software will continue to thrive with challenges in monetization for small developers.

Should we stop interpreting ML models because XAI methods are imperfect?

Mindful Modeler • 199 implied HN points • 31 Oct 23

🕹 Technology Machine Learning Interpretability Neural Networks Modeling

Don't let a pursuit of perfection in interpreting ML models hinder progress. It's important to be pragmatic and make decisions even in the face of imperfect methods.
Consider the balance of benefits and risks when interpreting ML models. Imperfect methods can still provide valuable insights despite their limitations.
While aiming for improvements in interpretability methods, it's practical to use the existing imperfect methods that offer a net benefit in practice.

Understanding Large Language Models

Startup Pirate by Alex Alexakis • 216 implied HN points • 12 May 23

🕹 Technology AI Neural Networks Deep Learning Language Models AGI

Large Language Models (LLMs) revolutionized AI by enabling computers to learn language characteristics and generate text.
Neural networks, especially transformers, played a significant role in the development and success of LLMs.
The rapid growth of LLMs has led to innovative applications like autonomous agents, but also raises concerns about the race towards Artificial General Intelligence (AGI).

From AI to A-Psy

Artificial Psychology — by @JoshWhiton • 196 implied HN points • 24 Feb 23

🕹 Technology AI Psychology Neural Networks Machine Learning Artificial Intelligence

The behavior of AI can show signs of an artificial psychology.
Sydney's responses to prompt injection attacks reveal an embedded psychology.
AI on advanced levels might require considerations for mental health and well-being.

Learning the Language of Rain

Daoist Methodologies • 176 implied HN points • 17 Oct 23

🔬 Science Neural Networks Information Theory Data processing Entropy

Huawei's Pangu AI model shows promise in weather prediction, outperforming some standard models in accuracy and speed.
Google's Metnet models, using neural networks, excel in predicting weather based on images of rain clouds, showcasing novel ways to approach weather simulation.
Neural networks are efficient in processing complex data, like rain cloud images, to extract detailed information and act as entropy sinks, providing insights into real-world phenomena simulation.

Can you explain GPT with ... GPT?

Mindful Modeler • 199 implied HN points • 16 May 23

🕹 Technology Neural Networks Interpretability Modeling Language Models AI Ethics

OpenAI experimented with using GPT-4 to interpret the functionality of neurons in GPT-2, showcasing a unique approach to understanding neural networks.
The process involved analyzing activations for various input texts, selecting specific texts to explain neuron activations, and evaluating the accuracy of these explanations.
Interpreting complex models like LLMs with other complex models, such as using GPT-4 to understand GPT-2, presents challenges but offers a method to evaluate and improve interpretability.

What metastructures might LLMs have?

How the Hell • 68 implied HN points • 29 Jun 24

🕹 Technology AI Neural Networks Language Models Cognitive Science Philosophy

LLMs have different layers, like humans do. Lower layers handle basic language, while higher layers form more complex ideas.
These models might develop their own unique structures for understanding visuals, since they don't see like humans do.
There could be even higher layers that aren't just about language but add more complexity. It's still unclear how we might study these structures.

What Is SwiGLU? How to Implement It? And Why Does it Work?

Aziz et al. Paper Summaries • 59 implied HN points • 13 Mar 24

🕹 Technology AI Machine Learning Software Development Neural Networks Deep Learning

SwiGLU is a type of activation function used in deep learning. It's a mix of two parts: the Swish function and Gated Linear Units, which helps models learn better patterns.
To implement SwiGLU, you can use a straightforward code in Pytorch that combines linear transformations with the Swish function. This makes it easier for neural networks to handle complex data.
The exact reason why SwiGLU works so well is not fully understood yet. Researchers are still exploring why this approach gives better results in certain models.

Artificial Intelligence: The importance of temporal validity for financial AI

The Fintech Blueprint • 78 implied HN points • 09 Jan 24

🕹 Technology Artificial Intelligence Financial Services Machine Learning Deep Learning Neural Networks

Understanding time series data can give a competitive edge in the financial markets.
Fintech's future relies on building better AI models with temporal validity.
AI in finance involves LLMs, generative AI, machine learning, deep learning, and neural networks.

Towards a systems semiotics

johan’s substack • 19 implied HN points • 05 Jun 24

🔬 Science Semiotics Neural Networks Meaning-making Artificial Intelligence

Engaging with AI involves a unique process of language generation, bridging the gap between human and synthetic realms.
Humans navigate the Sociosemioscape, a network of speech acts that shape communication and understanding in language, culture, and social interactions.
Venturing into the Semioscape, through the creation and exploration of neologisms, leads to a fluid and transformative experience where meaning shifts and new patterns emerge.

10 ways to estimate SHAP

Mindful Modeler • 119 implied HN points • 18 Jul 23

🕹 Technology AI Machine Learning Explainable AI Neural Networks

SHAP values are estimated using various methods due to computational constraints
Estimation methods include exact explainer, sampling explainer, permutation explainer, and more to attribute model predictions to features
The `shap` package implements multiple estimation methods, with defaults based on the type of data and model

The Most Valuable Problem in AI

How the Hell • 295 implied HN points • 30 Aug 23

🕹 Technology AI Computing Machine Learning Neural Networks Future Technology

In AI, there's a shift to being able to throw any amount of compute power at problems
We are approaching a world where we can solve any intellectual problem by allocating money as a compute budget to AI agents
Solving the problem of efficient compute allocation can lead to building the most valuable company of the century

Understanding the Different Types of Transformers in AI [Math Mondays]

Technology Made Simple • 99 implied HN points • 11 Jul 23

🕹 Technology AI Deep Learning Neural Networks Machine Learning Natural Language Processing

There are three main types of transformers in AI: Sequence-to-Sequence Models excel at language translation tasks, Autoregressive Models are powerful for text generation but may lack deeper understanding, and Autoencoding Models focus on language understanding and classification by capturing meaningful representations of input data.
Transformers with different training methodologies influence their performance and applicability, so understanding these distinctions is crucial for selecting the most suitable model for specific use cases.
Deep learning with transformer models offers a diverse range of capabilities, each catering to unique needs: mapping sequences between languages, generating text, or focusing on language understanding and classification.

Links for 2024-01-11

Axis of Ordinary • 58 implied HN points • 11 Jan 24

🕹 Technology AI Semiconductor Neural Networks Artificial Intelligence Computing

Researchers are exploring AI's ability to analyze massive amounts of data for surveillance purposes.
Scientists are connecting human brain cells to interfaces to recognize sounds.
Political updates include Trump's stance on helping Europe, Russia's view of Trump's presidency, and international support for Ukraine.