The hottest Speech Recognition Substack posts right now

And their main takeaways

Languages don't need many words

John Ball inside AI • 39 implied HN points • 24 Jul 24

You don't need many words to communicate in a new language. Just a small vocabulary can help you get by in everyday conversations.
For understanding most spoken and written text, around 2000 words are usually enough. This covers about 80% of regular communication.
Machine learning and AI can benefit from understanding language like humans do, by learning new words in context rather than just relying on a large vocabulary.

Import AI 331: 16X smaller language models; could AMD compete with NVIDIA?; and BERT for the dark web

Import AI • 319 implied HN points • 29 May 23

🕹 Technology AI Research Speech Recognition Dark web AI Regulation

Researchers have found a way to significantly reduce memory requirements for training large language models, making it feasible to fine-tune on a single GPU, which could have implications for AI governance and model security.
George Hotz's new company, Tiny Corp, aims to enable AMD to compete with NVIDIA in AI training chips, potentially paving the way for a more competitive AI chip market.
Training language models on text from the dark web, like DarkBERT, could lead to improved detection of illicit activities online, showcasing the potential of AI systems in monitoring and identifying threats in the digital space.

Decoding Whisper: An In-Depth Look at its Architecture and Transcription Process

Amgad’s Substack • 79 implied HN points • 21 Jan 24

🕹 Technology AI Speech Recognition Architecture

The focus of the project 'Whisper' was on scaling training with massive amounts of data, using a proven encoder-decoder architecture to avoid complicating findings with model improvements.
The model architecture features an encoder with stem and blocks, along with a decoder incorporating cross-attention layers, and an audio processor that prepares input features from audio segments.
Improvements in Whisper's accuracy and robustness primarily came from the scale and quality of the data, showcasing the significance of data processing over novel architecture decisions.

Artifacts 5: Mini RLHF book underway, Qwen 2.5, video datasets, audio models, and more

Democratizing Automation • 63 implied HN points • 24 Oct 24

🕹 Technology AI Models Datasets Machine Learning Speech Recognition

There's a new textbook on RLHF being written that aims to help readers learn and improve the content through feedback.
Qwen 2.5 models are showing strong performance, competing well with models like Llama 3.1, but have less visibility in the community.
Several new models and datasets have been released, including some interesting multimodal options that can handle both text and images.

When Whisper 1.0 Gets It Wrong: An Inside Look at Speech-to-Text Failures

Dubverse Black • 117 implied HN points • 19 Apr 23

🕹 Technology AI Speech Recognition Data Collection

OpenAI's Whisper model, while impressive, still has limitations and failures in speech-to-text accuracy.
Whisper's challenges include repeating segments, mixing voice and non-voice activities, and inaccuracies in timestamps.
The drawbacks of Whisper 1.0 present opportunities for learning, adaptation, and further development in enhancing speech-to-text technology.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

NVIDIA announces TensorRT LLM to make LLM Inference easy(on H100!)

MLOps Newsletter • 58 implied HN points • 24 Sep 23

🕹 Technology AI Models Machine Learning Data processing Text Analysis Speech Recognition

NVIDIA introduces TensorRT LLM for faster LLM inference on H100 GPUs
Google develops Inverse Reinforcement Learning method for training AI to mimic human behavior
Pinterest uses Ray framework for faster data processing in its pipeline

Exploring Whisper's Multitask Interface: A Closer Look at its Speech Transcription and Translation Capabilities

Amgad’s Substack • 19 implied HN points • 16 Feb 24

🕹 Technology AI Speech Recognition Translation

Whisper, a versatile AI tool, can transcribe speech accurately in various languages, not just English.
The multitask interface of Whisper guides the decoder to generate desired outputs by using special tokens in the input sequence.
Users can prompt Whisper by adding custom vocabulary and previous predictions to help achieve more accurate transcriptions and translations.

Math TTS, VideoRAG, and Self-Adaptive LLMs

HackerPulse Dispatch • 5 implied HN points • 17 Jan 25

🕹 Technology AI Machine Learning Speech Recognition Natural Language Processing Video Processing

MathReader turns math documents into speech, making it easier for people to access and understand math content.
VideoRAG helps improve language generation by pulling in relevant video content, which can provide more context than text alone.
ELIZA, the first chatbot ever created, has been restored, so people can see how early AI worked and explore its historical significance.

Truly Open Models, Code Llama 70B, Amazon AI Hackathon , AI Grant, world’s greenest 7B model and more

AI Brews • 17 implied HN points • 02 Feb 24

🕹 Technology AI Open-source models Generative AI Speech Recognition

Allen Institute for AI released truly open language models called OLMo
Meta AI introduced AudioSeal for detecting AI-generated speech
RWKV released Eagle 7B, a green multi-lingual model with low inference cost

SOTA ASR Tooling: Long-form Transcription

Amgad’s Substack • 3 HN points • 27 Mar 24

🕹 Technology Speech Recognition Artificial Intelligence Benchmarking Model optimization

Benchmarking different whisper frameworks for long-form transcription is essential for accuracy and efficiency metrics such as WER and latency.
Utilizing algorithms like OpenAI's Sequential Algorithm and Huggingface Transformers ASR Chunking Algorithm can help transcribe long audio files efficiently and accurately, especially when optimized for float16 precision and batching.
Frameworks like WhisperX and Faster-Whisper offer high transcription accuracy while maintaining performance, making them suitable for small GPUs and long-form audio transcription tasks.

Grounding Large Language Models in a Cognitive Foundation

The Gradient • 20 implied HN points • 15 Apr 23

🕹 Technology AI Robotics Neural Networks Speech Recognition

Intelligent robots have struggled commercially due to the challenge of having meaningful conversations with them.
Recent advancements in AI, speech recognition, and large language models like ChatGPT and GPT-4 have opened up new possibilities.
For robots to effectively interact in the physical world, they need to quickly adapt to context and be localized in their knowledge.

First Steps: Making your own voice activated virtual assistant

cole’s Substack • 4 HN points • 14 Feb 23

🕹 Technology Programming Artificial Intelligence Speech Recognition Text-to-Speech Hardware

Recording audio input from a microphone is a simple first step
Using OpenAI Whisper for voice to text conversion is an easy and effective process
Experiment with different models for generating responses and improving text-to-speech capabilities

Interview with the Australian Writers' Centre

Solresol • 0 implied HN points • 15 Apr 24

🕹 Technology AI Speech Recognition SEO Gaming

The interview discusses the impact of ChatGPT, Anthropic, and other GenAI tools on jobs for writers.
The conversation delves into topics like speech recognition and why SEO may be losing relevance.
An interesting aspect explored is the connection between GenAI tools and the computer gaming industry.

Speech Recognition

Prompt Engineering • 0 implied HN points • 05 Jul 23

🕹 Technology Speech Recognition AI API Data Training

OpenAI's Whisper model is a powerful tool for audio to text transcription, trained on 680,000 hours of data.
Voice interfaces are often tied to specific software, but a general-purpose voice transcriber like Whisper could be very useful.
Whisper can be integrated with tools like ChatGPT for recording and transcribing text to work on creating a stronger narrative.