The hottest Model Testing Substack posts right now

Measuring AI performance is tricky because common tests can be flawed and sometimes don't really show how smart the AI is. We're often left uncertain about what these benchmarks actually mean.
Using a more personal approach, like creating fun and unique tests, can help people understand how different AI models work. This way, you get a feel for the AI's strengths and weaknesses in a more relatable way.
When companies choose AI tools, it's important to do thorough testing based on real tasks instead of just relying on average performance scores. Understanding specifically how well an AI can perform your unique tasks is key.

AI Day 2023 event will feature learning from red teaming AI models and systems.
Thai Duong will present online from California due to logistical issues.
Presentation at AI Day 2023 will include cool demos, aiming to be fun and engaging.

Starting a series on fine-tuning a general-purpose Stable Diffusion model
Outlined steps for choosing training images, cleaning data, and preparing captions
Shared details on training protocol, model fine-tuning, testing results, and next steps

Sora's deepfake potential raises concerns about public access and misuse, prompting challenges for safety and fine-tuning.
Long-context models like Gemini 1.5 offer exciting possibilities like analyzing code bases and DNA processing, showcasing potential for various domains.
Inference costs for models like Sora are substantial, with estimates indicating potentially high costs for generating videos, highlighting challenges in scalability and cost-effectiveness.

Large Language Models may produce more incorrect answers if previous answers were incorrect
The effect of incorrect previous answers on future answers is small and varies based on the prompt given to the AI
Prompts that explicitly ask the AI to match previous behavior can lead to more incorrect answers

Get a weekly roundup of the best Substack posts, by hacker news affinity: