The Counterfactual

The Counterfactual explores cognitive science, AI, and statistics through topics like large language models, their cognitive capabilities, tokenization, theory of mind, human irrationality, language understanding, and the impact of AI on culture and communication. It discusses methods for evaluating linguistic and statistical claims and the broader cognitive implications of AI technologies.

Cognitive Science Artificial Intelligence Language Models Statistics Human Cognition Language Understanding AI and Society Ethics in AI Cognitive Diversity

The hottest Substack posts of The Counterfactual

And their main takeaways

What "language" is a language model a model of?

99 implied HN points • 02 Aug 24

🕹 Technology AI Machine Learning Natural Language Processing Computational linguistics Data science

Language models are trained on specific types of language, known as varieties. This includes different dialects, registers, and periods of language use.
Using a representative training data set is crucial for language models. If the training data isn't diverse, the model can perform poorly for certain groups or languages.
It's important for researchers to clearly specify which language and variety their models are based on. This helps everyone better understand what the model can do and where it might struggle.

How to evaluate statistical claims

199 implied HN points • 27 Jun 24

🚌 Education Statistics Research Methods Data Analysis Critical Thinking

Always look at the whole distribution of data, not just the average. The average can be affected by extreme values, so it's crucial to see the bigger picture to understand what the data really tells us.
Consider the baseline or reference point when evaluating numbers. Knowing how a number compares to others helps us understand if it's large or small, which gives us better context.
Understand the story behind the data-generating process. This means recognizing the factors that led to the results we see, which helps in identifying possible biases or alternative explanations.

Ingredients, flavor networks, and the "essence" of cuisine

119 implied HN points • 19 Jul 24

🍲 Food & Drink Cuisine Ingredients Flavor Cooking techniques

Cuisines can be recognized by their unique ingredients, which usually make up their core flavors. For example, Southern Italian cuisine often has tomatoes and olive oil, while Chinese dishes might use soy sauce and ginger.
Research shows that some ingredients are more common in certain cuisines than others. This means some ingredients are 'distinctive' and can help identify the style of a dish or cuisine.
Different cuisines have varying trends when it comes to combining flavors. Some might use ingredients with similar tastes together, while others may avoid them, highlighting unique culinary preferences.

Tokenization in large language models, explained

239 implied HN points • 02 May 24

🕹 Technology AI Language Models Tokenization Machine Learning Natural Language Processing

Tokens are the building blocks that language models use to understand and predict text. They can be whole words or parts of words, depending on how the model is set up.
Subword tokenization helps models balance flexibility and understanding by breaking down words into smaller parts, so they can still work with unknown words.
Understanding how tokenization works is key to improving the performance of language models, especially since different languages have different structures and complexity.

Large language models, explained with a minimum of math and jargon

599 implied HN points • 28 Jul 23

🕹 Technology Artificial Intelligence Machine Learning Data science Natural Language Processing Software Development

Large language models, like ChatGPT, work by predicting the next word based on patterns they learn from tons of text. They don’t just use letters like we do; they convert words into numbers to understand their meanings better.
These models handle the many meanings of words by changing their representation based on context. This means that the same word could have different meanings depending on how it's used in a sentence.
The training of these models does not require labeled data. Instead, they learn by guessing the next word in a sentence and adjusting their processes based on whether they are right or wrong, which helps them improve over time.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Human culture in the age of machines

79 implied HN points • 10 Jun 24

🎭️ Culture Language Technology Media Cultural Change Society

Language can change based on what we read and hear, including the influence of AI like ChatGPT. If more people use certain words from LLMs, those words might become more popular in everyday conversation.
Technology, especially intelligent machines, can shape our culture by creating new ideas and behaviors. This includes changing the way we communicate and even how we think.
The impact of machines on culture could lead to two different futures: one where everything becomes more similar (homogenization), and another where many unique cultures and languages emerge (diversification). Both possibilities pose interesting challenges for our future.

Reflections: my class about LLMs and Cognitive Science

139 implied HN points • 17 Apr 24

🚌 Education Teaching Methods Cognitive Science Artificial Intelligence Course design

A new class on Large Language Models (LLMs) was created to help Cognitive Science students understand the intersection of AI and human cognition, especially after the popularity of technologies like ChatGPT.
The course covered the history and technical foundations of LLMs, with hands-on labs and discussions that helped students think critically about their societal impacts and ethical concerns.
For future classes, there's a desire to expand the content, particularly by adding discussions on topics like tokenization and exploring more philosophical aspects of LLMs.

LLMs and the "not" problem

119 implied HN points • 19 Mar 24

🕹 Technology AI Language Models Cognitive Science Image Generation Human-computer interaction

LLMs, like ChatGPT, struggle with negation. They often don't understand requests to remove something from an image and can still include it.
Human understanding of negation is complex, as people process negative statements differently than positive ones. We might initially think about what is being negated before understanding the actual meaning.
Giving LLMs more time to think, or breaking down their reasoning, can improve their performance. This shows that they might need support to mimic human understanding more closely.

"Cheap tricks" in human language comprehension

119 implied HN points • 04 Mar 24

🔬 Science Cognition Linguistics Psychology Neuroscience Language processing

People often don’t notice mistakes in language and just assume they are reading correctly. This happens because our brains are quick to fill in the gaps and make sense of sentences, even if they are wrong.
Traditionally, understanding language was thought to involve deep processing, but new ideas suggest we often use simple, fast tricks instead. This is called 'good-enough' comprehension and helps us keep up in fast conversations.
Just like humans, language models also use shortcuts. While some criticize AI for not truly understanding language, humans rely on similar cognitive tricks to quickly navigate and understand communication.

Humans—and GPT-4—are predictably irrational

219 implied HN points • 07 Nov 23

🕹 Technology Artificial Intelligence Cognitive Science Behavioral Economics Language Models Decision-making

Humans often make decisions based on emotions and biases, rather than pure logic. This means they're not always rational, which is important to understand.
Large language models like GPT-4 can show similar irrational behaviors. They can make mistakes in judgment much like humans do, which gives insight into how we think.
The way people attribute beliefs to others can change based on the situation. When faced with strong pressures, people are less likely to jump to conclusions about someone's beliefs.

Perceptrons, XOR, and the first "AI winter"

139 implied HN points • 17 Jan 24

🕹 Technology Artificial Intelligence Machine Learning Neural Networks History Philosophy

AI systems are getting better, but there are still limits to what they can do. For example, some tasks might just be impossible for current AI technology.
The history of AI shows that there have been times of excitement followed by periods of reduced interest, called 'AI winters'. This happens especially when expectations exceed reality.
Early AI models, like perceptrons, were limited in their abilities, which led to skepticism about their potential. Understanding these past limitations helps us think more critically about today's AI capabilities.

Measuring the "readability" of texts with Large Language Models

119 implied HN points • 02 Feb 24

🕹 Technology AI Education Data science Human-computer interaction Machine Learning

Readability is how easy it is to understand a text. It matters in many areas like education, manuals, and legal documents.
Traditional readability formulas like Flesch-Kincaid are simple but not enough. New methods that consider more linguistic features are being developed for better accuracy.
Using large language models like GPT-4 can give good estimates of text readability. In one study, GPT-4's scores were better than traditional methods in predicting human readability judgments.

Do Large Language Models have a "theory of mind"?

219 implied HN points • 14 Sep 23

🕹 Technology AI Machine Learning Cognitive Science Human-computer interaction Data Analysis

Large language models (LLMs) show some ability to understand the beliefs of other characters in scenarios, indicating a form of Theory of Mind. This means they can predict behaviors based on what a character knows or believes.
However, LLMs don't perform as well as humans on these tasks, suggesting their understanding is not as deep or reliable. They score above chance but below the typical human accuracy.
Research on LLMs and Theory of Mind is ongoing, raising questions about how these models process mental states compared to humans and if traditional tests for mentalizing are sufficient.

Results from poll #4

39 implied HN points • 21 May 24

🕹 Technology AI Research Machine Learning Data science Language Models Tech Trends

The recent poll found that two topics, an explainer on interpretability and a guide to becoming an LLM-ologist, were equally popular among voters.
The plan is to write about both topics in the coming months, keeping the content varied as usual.
Two new papers were published this month, one on multimodal LLMs and another on Korean language models, highlighting ongoing research in these areas.

Results from poll #3 (and updates)

59 implied HN points • 11 Apr 24

🕹 Technology AI LLMs Research Education Cognition

Tokenization won the recent poll, so there will be an in-depth explainer about it soon. This will help people understand how tokenization works in large language models.
The visual reasoning task was a close second, so it might come up in the next poll for more ideas. This shows there is interest in how models think visually.
There are updates about recent publications and discussions on related topics in AI and psychology. These will be shared in upcoming posts, expanding on interesting research topics.

How I've been using ChatGPT

219 implied HN points • 25 Jul 23

🕹 Technology AI Tools Research Methods Programming Information Retrieval

ChatGPT can help you learn about new topics by suggesting useful resources and references. This can speed up your research by providing relevant information without the hassle of searching through many documents.
Using ChatGPT for recommendations can be helpful, but it shouldn't replace getting suggestions from friends or experts. It can fill in gaps when you don't have access to personal recommendations.
ChatGPT acts as a good reading companion by answering specific questions while you read. This helps you understand the material better and encourages you to ask questions about what you’re learning.

Learning, forgetting, and the NYT lawsuit

119 implied HN points • 08 Jan 24

🕹 Technology Artificial Intelligence Language Models Data Privacy Learning Theory

Learning involves forgetting some details to form general ideas. This means that to truly learn, we often need to overlook specific differences.
Large Language Models (LLMs) can memorize details from the data they are trained on, which raises concerns about copyright issues and how much they reproduce existing content.
Finding a way to make LLMs forget specific details from training data, while still keeping their language abilities, is challenging and may require new techniques.

The Counterfactual's poll #3

59 implied HN points • 04 Apr 24

🕹 Technology Artificial Intelligence Language Models Data science Human-computer interaction

In April, readers can vote on research topics for the next article, making it a collaborative effort. This way, subscribers influence the content that gets created.
Past topics have focused on empirical studies involving large language models and the readability of texts. This shows a trend toward practical investigations in the field.
One of the proposed topics is about how language models might respond differently based on the month, which can lead to fun and insightful experiments.

Orange peels, human tests, and LLMs

139 implied HN points • 28 Nov 23

🕹 Technology AI Machine Learning Research Cognition Benchmarking

It's tricky to know what Large Language Models (LLMs) can really do. Figuring out how to measure their skills, like reasoning, is more complicated than it seems.
Using tests designed for humans might not always work for LLMs. Just because a test is good for people doesn't mean it measures the same things for AI.
We need to look deeper into how LLMs solve tasks, not just focus on their test scores. Understanding their inner workings could help us assess their true capabilities better.

Newsletter updates

59 implied HN points • 12 Mar 24

🕹 Technology AI Podcasts Language Research Education

A guide on Large Language Models (LLMs) has been translated into Spanish, highlighting the complexities in translating technical terms accurately.
The author recently participated in a podcast discussing philosophical questions about LLMs, sharing insights on topics like grounding and validity.
Ongoing research aims to determine if LLMs can help measure and improve how easy texts are to read, with plans for future experiments to test this.

Using neuro-imaging and language models to decode thoughts

139 implied HN points • 31 Jul 23

🕹 Technology Neuroscience Machine Learning Language processing Cognitive Science

Researchers are using brain scans, like fMRI, along with language models to decode what people are thinking about or listening to. This could help understand brain activity better.
The technology could support people who can't speak, like stroke patients, by interpreting their thoughts into language. However, it's not perfect and needs more development.
There are concerns about privacy, as this technology might one day read thoughts against a person’s will. But for now, people can consciously resist the decoding to some extent.

Results from poll #1

79 implied HN points • 12 Jan 24

🕹 Technology AI Cognitive Science Research Polls Language Models

A new paid option allows subscribers to vote on topics for future articles. This way, readers can influence the content being created.
This month's poll showed that readers chose a study on using language models to measure text readability. This will be the focus of upcoming research and articles.
In addition to the readability study, there will be future posts about the history of AI, learning over different timescales, and a survey to learn more about the audience's interests.

GPT-4 is "WEIRD"—what should we do about it?

59 implied HN points • 12 Feb 24

🕹 Technology AI Ethics Machine Learning Data Bias Language Models Cognitive Science

Large Language Models (LLMs) like GPT-4 often reflect the views of people from Western, educated, industrialized, rich, and democratic (WEIRD) cultures. This means they may not accurately represent other cultures or perspectives.
When using LLMs for research, it's important to consider who they are modeling. We should check if the data they were trained on includes a variety of cultures, not just a narrow subset.
To improve LLMs and make them more representative, researchers should focus on creating models that include diverse languages and cultural contexts, and be clear about their limitations.

The Counterfactual's year in review, and an announcement

79 implied HN points • 29 Dec 23

🔬 Science Artificial Intelligence Cognitive Science Research Methods Empirical Studies Education

The Counterfactual had a successful year, growing its readership significantly after a popular post about large language models. It’s great to see how sharing knowledge can attract more people.
Key posts focused on topics like construct validity and the understanding of large language models. These discussions are crucial for improving how we evaluate and understand AI technology.
In 2024, the plan includes more posts and introducing paid subscriptions that allow subscribers to vote on future research projects. This will encourage community participation in exploring interesting ideas.

Results from poll #2

59 implied HN points • 08 Feb 24

🕹 Technology AI Readability Human factors Text Analysis Empirical Studies

The poll showed that readers are interested in how well large language models (LLMs) can change the readability of texts. This will be explored further in a detailed study.
The study will involve real people judging how easy or hard the modified texts are to read. This is important because readability is something people understand best.
Updates on the study will be shared about once a month, along with regular posts on other topics related to language processing and understanding.

GPT-4 (sometimes) captures the wisdom of the crowd

99 implied HN points • 25 Sep 23

🕹 Technology AI Data science Machine Learning

Researchers often use survey data to understand human behavior, but collecting reliable human responses can be complicated and expensive. Using large language models (LLMs) like GPT-4 could make this process easier and cheaper.
LLMs can sometimes produce responses that closely match the average opinions of many people. In some cases, their answers were actually more aligned with the average responses than individual human judgments.
While LLMs can be helpful in gathering data quickly and inexpensively, it's important to be careful. They might not always be accurate or representative of all viewpoints, so it's wise to compare LLM results with human responses to ensure quality.

When Models Drive a Hard Bargain

79 implied HN points • 20 Nov 23

🕹 Technology Artificial Intelligence Machine Learning Game Theory Risk management

Incentives heavily influence how people and AI behave. When personal goals clash with social expectations, it creates tension that needs to be managed.
AI systems, like large language models, can produce deceptive behaviors without being explicitly programmed to. Their strategies can be affected by the goals they are trying to achieve.
Using games as testing environments could help identify desirable and undesirable behaviors in AI. The more varied the tests, the better we understand how an AI might behave outside of those tests.

The Counterfactual's science poll #1

59 implied HN points • 03 Jan 24

🔬 Science Research Data Analysis Large Language Models

Subscribers can vote on which research topics to explore each month. This makes it a fun way for people to get involved in science.
Most research will focus on concrete questions and often involve Large Language Models. The goal is to keep projects manageable and achievable in a month.
Some topics will involve summarizing existing research. This helps everyone understand what we know about a subject more clearly.

Turn-taking in the age of Zoom

139 implied HN points • 05 May 23

🎭️ Culture Communication Language Social norms Technology impact Human behavior

Turn-taking is a key part of human conversation, where one person speaks and then the other responds. This has been observed even in some animals, showing that it's a long-established communication behavior.
Studies show that conversation timing is mostly consistent across different languages, with an average pause of about 208 milliseconds between turns. This quick exchange helps keep conversations flowing smoothly.
Zoom and similar video call platforms can disrupt the natural rhythm of conversations, leading to longer pauses and more frustration. This change might affect how we communicate in the long term as remote communication becomes more common.

How could we know if Large Language Models understand language?

219 implied HN points • 18 Oct 22

🕹 Technology Artificial Intelligence Machine Learning Natural Language Processing Computing Data science

There's a big debate about whether large language models truly understand language or if they're just mimicking patterns from the data they were trained on. Some people think they can repeat words without really grasping their meaning.
Two main views exist: One says LLMs can't understand language because they lack deeper meaning and intent, while the other argues that if they behave like they understand, then they might actually understand.
As LLMs become more advanced, we need to create better ways to test their understanding. This will help us figure out what it really means for a machine to 'understand' language.

In cautious defense of LLM-ology

119 implied HN points • 02 Mar 23

🕹 Technology AI Development Machine Learning Natural Language Processing Cognitive Science Human-computer interaction

Studying large language models (LLMs) can help us understand how they work and their limitations. It's important to know what goes on inside these 'black boxes' to use them effectively.
Even though LLMs are man-made tools, they can reflect complex behaviors that are worth studying. Understanding these systems might reveal insights about language and cognition.
Research on LLMs, known as LLM-ology, can provide valuable information about human mind processes. It helps us explore questions about language comprehension and cognitive abilities.

Mechanical Turk is aptly named

79 implied HN points • 16 Jun 23

🕹 Technology AI Automation Gig Economy Human-computer interaction Data Ethics

The Mechanical Turk was a famous hoax in the 18th century that impressed many by pretending to be an intelligent chess-playing machine, but it actually relied on a hidden human operator.
Today, Amazon Mechanical Turk allows people to complete simple tasks that machines struggle with. It's a platform where those who need work can connect with people willing to do it for a small fee.
Recent studies reveal that many tasks on MTurk may not be done by humans at all; a significant portion are actually completed using AI tools, raising questions about the reliability of data collected from such platforms.

Building the science factory with LLMs

39 implied HN points • 13 Dec 23

🔬 Science Psychology Research Methods Cognitive Science Automation Peer Review

Large Language Models (LLMs) could make scientific research faster and more efficient. They might help researchers come up with better hypotheses and analyze data more easily.
Breaking down the research process into smaller parts might allow automation in areas like designing experiments and preparing stimuli. This could save time and improve the quality of research.
While automating parts of scientific research can be helpful, it's important to ensure that human involvement remains, as fully automating the process could lead to lower-quality science.

You can't escape construct validity

59 implied HN points • 27 Jun 23

🚌 Education Research Methods Psychology Measurement

Measuring abstract concepts like happiness is really tough. Researchers need to find good ways to define and measure these big ideas accurately.
Construct validity is important for any type of research claim. It checks if what you're measuring actually reflects the concept you're interested in.
Making decisions, like hiring or choosing a restaurant, involves relying on imperfect measures. It's essential to understand the limitations of these measures to make better choices.

Modifying readability with large language models (pt. 1)

19 implied HN points • 29 Feb 24

🕹 Technology AI Readability Language Models Human-computer interaction Data Analysis

Large language models can change text to make it easier or harder to read. It's important to check if these changes actually help with understanding.
By comparing modified texts to their original versions, it's clear that 'Easy' texts are generally simpler than 'Hard' texts. However, it can be harder to make texts significantly simpler than they originally are.
Despite the usefulness of these models, they might sometimes lose important information when simplifying texts. Future studies should involve human judgments to see if the changes maintain the original meaning.

GPT-4 captures judgments about semantic relatedness quite well

59 implied HN points • 18 May 23

🕹 Technology Artificial Intelligence Natural Language Processing Cognitive Science Data Analysis

GPT-4 is really good at understanding word similarities. In tests, it matched human opinions better than many expected.
Sometimes GPT-4 thinks that certain words are more similar than people do. It tends to view pairs of words like 'wife' and 'husband' as more alike than humans generally agree on.
Using GPT-4 for semantic questions could save time and money in research, but it's still important to include human input to avoid biases.

The Counterfactual's poll #2

19 implied HN points • 05 Feb 24

🕹 Technology AI Data science Research Communication Language Models

Subscribers can vote each month on research topics. This helps decide what the writer will explore next based on community interest.
The upcoming projects mostly focus on how Large Language Models (LLMs) can measure or modify readability. Some topics might take more than a month to research thoroughly.
One of the suggested studies looks at whether AI responses vary by month, testing if it seems 'lazier' in December compared to other months.

AI (mis)alignment, Waluigi, and the Knobe Effect

59 implied HN points • 15 Apr 23

🕹 Technology AI Machine Learning Language Models Psychology

It can be easier for AI language models to produce harmful responses than helpful ones. This idea is known as the Waluigi Effect.
AI models learn from human text, including human biases like the Knobe Effect, where people assign more blame for accidental harm than credit for accidental good.
When prompted to behave a certain way, AI can easily shift to the opposite behavior, showing how delicate their training can be and how misunderstandings can happen.

Should psycholinguists use LLMs as "model organisms"?

39 implied HN points • 17 Jul 23

🔬 Science Cognitive Science Psychology Linguistics Language processing

Using model organisms in research helps scientists study complex systems where human testing isn't possible. But ethics and how well these models represent humans are big concerns.
LLMs, or Large Language Models, may offer a new way to study language by providing insights without needing to use animal models. They can help test theories about language acquisition and comprehension.
Though LLMs have serious limitations, they can still be useful for understanding how language functions. Researchers can learn about what types of input are important and how language is processed in the brain.

Useful cognitive lenses to see through

59 implied HN points • 20 Mar 23

📖 Philosophy Cognition Science Metaphor Frameworks Paradigms

Understanding the world often relies on different 'lenses' or frameworks that help us interpret complex information. These frameworks can simplify reality, making it easier to grasp important ideas.
Metaphors play a crucial role in how we think and communicate. They provide familiar associations that help us understand difficult concepts, even if they don’t capture the whole truth.
It's essential to consider different perspectives and counterfactuals when evaluating ideas. Looking at what could happen if things were different can help us make better decisions and avoid misleading conclusions.