The hottest Machine Learning Substack posts right now

And their main takeaways

Data Design For Fine-Tuning LLM Long Context Windows

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 03 May 24

🕹 Technology Machine Learning

Fine-tuning large language models (LLMs) can help them better understand and use long pieces of text. This means they can make sense of information not just at the start and end but also in the middle.
The 'lost-in-the-middle' problem happens because LLMs often overlook important details in the middle of texts. Training them with more focused examples can help address this issue.
The IN2 training approach emphasizes that crucial information can be found anywhere in long texts. It uses specially created question-answer pairs to teach models to pay attention to all parts of the context.

Run A Small Language Model (SLM) Local & Offline

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 14 Feb 24

🕹 Technology Machine Learning

Small Language Models (SLMs) can be run locally, giving you more control over your data and privacy. This means you can use them even without an Internet connection.
SLMs are great for specific tasks that don't need the power of larger models, such as simple text generation or sentiment analysis. They can do a lot with less resource demand.
Using SLMs can help businesses reduce costs related to API limits and data privacy issues. They also address delays that come with using larger models.

Hallucinations Are Fine, Actually

Artificial Ignorance • 92 implied HN points • 04 Mar 25

🕹 Technology Machine Learning

AI models can often make mistakes or 'hallucinate' by providing wrong information confidently. It's important for humans to check AI output especially for important tasks.
Even though AI hallucinations are a challenge, they're seen as something we can work to improve rather than an insurmountable problem.
Instead of aiming for AI to do everything on its own, we should use it as a tool to help us do our jobs better, understanding that we need to collaborate with it.

MistralAI Reveals the Mystery

Sector 6 | The Newsletter of AIM • 59 implied HN points • 13 Dec 23

🕹 Technology Machine Learning

MistralAI has launched a new model called Mixtral 8x7B that is faster and more efficient than competitors like Llama 2 70B. It can provide great performance while being cost-effective.
Mixtral can handle a lot of information at once, processing up to 32,000 tokens and supporting multiple languages such as English, French, and German.
This model also shows strong abilities in generating code and can be fine-tuned to follow instructions well, which is helpful for various applications.

The Case For Small Language Models

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 13 Feb 24

🕹 Technology Machine Learning

Small Language Models (SLMs) can do many tasks without the complexity of Large Language Models (LLMs). They are simpler to manage and can be a better fit for common uses like chatbots.
SLMs like Microsoft's Phi-2 are cost-effective and can handle conversational tasks well, making them ideal for applications that don't need the full power of larger models.
Running an SLM locally helps avoid challenges like slow response times, privacy issues, and high costs associated with using LLMs through APIs.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Way Of Model-Agnostic Machine Learning

Mindful Modeler • 139 implied HN points • 21 Feb 23

🕹 Technology Machine Learning

Choosing the best model based on performance is crucial in machine learning, even if personal preferences may influence model selection.
Embracing model-agnostic machine learning involves using software that enables flexible model choices, maintaining consistent APIs across models, and prioritizing model-agnostic interpretation methods.
Real-world constraints and preferences often lead to model-specific approaches, but advancements in interpretation methods, uncertainty quantification, and technology are making model-agnostic modeling more feasible.

The Sequence Radar #544: The Amazing DeepMind's AlphaEvolve

TheSequence • 63 implied HN points • 18 May 25

🕹 Technology Machine Learning

AlphaEvolve is a new AI model from DeepMind that helps discover new algorithms by combining language models with evolutionary techniques. This allows it to create and improve entire codebases instead of just single functions.
One of its big achievements is finding a faster way to multiply certain types of matrices, which has been a problem for over 50 years. It shows how AI can not only generate code but also make important mathematical discoveries.
AlphaEvolve is also useful in real-world applications, like optimizing Google's systems, proving it's not just good in theory but has practical benefits that improve efficiency and performance.

We Need Efficient and Transparent Language Models

Gradient Flow • 179 implied HN points • 01 Dec 22

🕹 Technology Machine Learning

Efficient and Transparent Language Models are needed in the field of Natural Language Processing for better understanding and improved performance.
Selecting the right table format is crucial when migrating to a modern data warehouse or data lakehouse.
DeepMind's work on controlling commercial HVAC facilities using reinforcement learning resulted in significant energy savings.

Data at Depth 12: Surging Substack, Creative Reflections, Streamlit GIS Tutorial

Data at Depth • 19 implied HN points • 02 May 24

🕹 Technology Machine Learning

Documenting analytics platform performance can reveal growth trends and areas needing more attention, like focusing on Substack engagement.
Balancing intrinsic and extrinsic motivation in creativity can impact the quality and longevity of content creation, pushing creators towards enduring satisfaction.
Utilizing AI like GPT-4 for filtering and mapping GIS data in Python with tools like Streamlit can streamline complex data visualization tasks, enhancing efficiency and interactivity.

Dive into UpYouth Vault

Binh’s Archive • 39 implied HN points • 12 Feb 24

🕹 Technology Machine Learning

UpYouth Vault is a knowledge management system at UpYouth accessed through a chatbot called Bob on Telegram.
At UpYouth, there was a need for a system like UpYouth Vault to prevent valuable knowledge from getting lost in group chats.
Bob, the chatbot, supports features like semantic search and Retrieval Augmented Generation to enhance user experience.

From Today, ChatGPT Will Remember Every Paying User

AI Disruption • 19 implied HN points • 30 Apr 24

🕹 Technology Machine Learning

ChatGPT's memory feature is now open to Plus users, helping it remember details shared in chats for seamless interactions.
The memory feature works by allowing users to ask ChatGPT to remember things or letting it learn on its own through interactions.
Deleting chats does not erase ChatGPT's memories; users need to delete specific memories if they wish. It is important for improving AI models and can enhance user experiences.

Google Gemini’s Woke Catechism

From the New World • 301 implied HN points • 23 Feb 24

🕹 Technology Machine Learning

Google's Gemini AI model displays intentional ideological bias towards far-left viewpoints.
The Gemini paper showcases methods used by Google to create ideological biases in the AI, also connecting to Biden's Executive Order on AI.
Companies, like OpenAI with GPT-4, may adjust their AI models based on public feedback and external pressures.

A Must for Indic Language Models

Sector 6 | The Newsletter of AIM • 39 implied HN points • 09 Feb 24

🕹 Technology Machine Learning

There is a big need for benchmarks specifically for Indian languages. This helps assess how well language models perform in those languages.
Upcoming models like Tamil Llama and Odia Llama are pushing for the creation of these benchmarks. They could lead to better evaluations for these Indic language models.
Having a leaderboard for Indic language models is vital. It will spotlight advancements and improvements within India's language technology space.

Three Considerations For Private Open-Source LLM Instances

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 29 Apr 24

🕹 Technology Machine Learning

Large Language Models (LLMs) can struggle with performance over time. This problem affects apps that depend on commercial LLM APIs, leading to inconsistencies in how these applications work.
Catastrophic forgetting is a challenge where LLMs forget earlier learned information when they learn new data. This can cause issues when the model is asked to understand broad topics.
Hosting your own open-source LLMs gives your organization more control. You can manage updates, training, and data privacy, making your applications more secure and tailored to your needs.

Edge 448: Meta AI's Technique For Building LLMs that "Think Before they Speak"

TheSequence • 140 implied HN points • 14 Nov 24

🕹 Technology Machine Learning

Meta AI is developing new techniques to make AI models better at reasoning before giving answers. This could help them become more like humans in problem-solving.
The research focuses on something called Thought Preference Optimization, which could lead to breakthroughs in how generative AI works.
Studying how AI can 'think' before speaking might change the future of AI, making it smarter and more effective in conversation.

Star Attention: Efficient LLM Inference over Long Sequences

Gonzo ML • 126 implied HN points • 09 Dec 24

🕹 Technology Machine Learning

Star Attention allows large language models to handle long pieces of text by splitting the context into smaller blocks. This helps the model work faster and keeps things organized without needing too much communication between different parts.
The model uses what's called 'anchor blocks' to improve its focus and reduce mistakes during processing. These blocks are important because they help the model pay attention to the right information, which leads to better results.
Using this new approach, researchers found improvements in speed while preserving quality in the model's performance. This means that making these changes can help LLMs work more efficiently without sacrificing how well they understand or generate text.

ChatGPT Explained: A Normie's Guide To How It Works

jonstokes.com • 587 implied HN points • 01 Mar 23

🕹 Technology Machine Learning

Understand the basics of generative AI: a generative model produces a structured output from a structured input.
Complex relationships between symbols require more computational power to relate them effectively.
Language models like ChatGPT don't have personal experiences or knowledge; they use a token window to respond based on the conversation context.

Running RVC Models on the Easy GUI

Dubverse Black • 78 implied HN points • 13 Oct 23

🕹 Technology Machine Learning

Retrieval-based Voice Conversion (RVC) uses a deep neural network to transform one voice into another.
RVC models are fast, allow voice cloning, are budget-friendly, and work well with minimal speech.
To run RVC models on Google Colab, connect to a custom GCE runtime, follow specific steps to process data, and train the models.

A neat way to catch unrelaible data [Technique Tuesdays]

Technology Made Simple • 79 implied HN points • 07 Jun 23

🕹 Technology Machine Learning

Feature Drift occurs when the distribution of the features being tracked changes, and it is a subset of Data Drift.
Detecting Feature Drift can be tricky when tracking numerous variables, potentially leading to detrimental outcomes over time.
A technique to catch Feature Drift involves creating artificial target variables based on old and new data sets, then using a simple Supervised Learning algorithm to identify drifting features.

Must Learn Quantum Security Preface: The Power of Quantum Computing in Advancing Artificial Intelligence

Rod’s Blog • 79 implied HN points • 15 Sep 23

🕹 Technology Machine Learning

Quantum computing has the potential to significantly enhance computational power and speed in AI tasks, offering faster and more accurate predictions.
Quantum computing enables the development of more sophisticated machine learning techniques by processing and analyzing large amounts of data more efficiently.
Quantum-inspired algorithms can be leveraged to improve classical AI algorithms, showcasing the benefits of quantum computing even without fully-fledged quantum computers.

Must Learn AI Security Part 10: Backdoor Attacks Against AI

Rod’s Blog • 79 implied HN points • 08 Sep 23

🕹 Technology Machine Learning

A backdoor attack against AI involves maliciously manipulating an artificial intelligence system to compromise its decision-making process by embedding hidden triggers.
Different types of backdoor attacks include Trojan attacks, clean-label attacks, poisoning attacks, model inversion attacks, and membership inference attacks, each posing unique challenges for AI security.
Backdoor attacks against AI can lead to compromised security, misleading outputs, loss of trust, privacy breaches, legal consequences, financial losses, highlighting the importance of securing AI systems with strategies like vetting training data, robust architecture, and continuous monitoring.

Sybil Montet: I SCRY

Do Not Research • 79 implied HN points • 14 Aug 23

🎨 Art & Illustration Machine Learning

The 'I SCRY' project by Sybil Montet explores the intersection of ancient esoteric traditions and modern predictive algorithm technologies.
The creation of I SCRY, an artificial oracle entity developed from a fine-tuned version of the GPT-3 algorithm, blurred the lines between technology and mysticism.
The cinematic essay 'I SCRY' reflects on the making of a digital oracle through a surrealistic journey involving CGI, AI-generated voices, and philosophical explorations of our future.

Human work, GenAI and Mechanical Turks

imperfect offerings • 79 implied HN points • 11 Jul 23

🕹 Technology Machine Learning

Technology like GenAI can be viewed as a platform for coordinating labor, shaping relationships between users, owners, and revenue sources.
The development of GenAI involves complex layers of human labor, from providing training data to post-training alignment through human feedback.
The economic structure surrounding GenAI results in the extraction of value for platform corporations, while the vast majority of human labor involved in its development remains unpaid or underpaid.

I'm Launching My Newsletter: The Tech Buffet!

The Tech Buffet • 79 implied HN points • 01 Sep 23

🕹 Technology Machine Learning

The Tech Buffet is a new newsletter focused on Machine Learning, Data Engineering, and Python Programming. It's designed to help people learn and improve their technical skills.
You can expect weekly updates with practical advice, tutorials, and insights on making machine learning systems more efficient and effective.
The creator wants feedback on what topics readers are interested in, so it's a community-driven project that aims to meet the needs of its audience.

Problem 83: Packing Robots [Brilliant]

Technology Made Simple • 79 implied HN points • 13 Apr 23

🕹 Technology Machine Learning

The post discusses a problem about packing robots with specific arrangement requirements that can help in developing problem-solving techniques.
It emphasizes the importance of consistency in learning by providing weekly problems for practice and solutions.
The author encourages sharing content and referrals as they help in personal growth and reaching more people.

The Rise of the AI Dragon

Sector 6 | The Newsletter of AIM • 59 implied HN points • 04 Dec 23

🕹 Technology Machine Learning

There are new AI models based on LLaMA, like DeepSeek, that are showing great performance. These models are pushing the boundaries of what AI can do.
Chinese companies are making significant progress in open source AI models and many are now leading in popularity and performance.
DeepSeek and other models are being developed with the goal of exploring artificial general intelligence, which aims to create more advanced AI systems.

ChatGPT is capable of cognitive empathy!

Nonzero Newsletter • 564 implied HN points • 30 Mar 23

🕹 Technology Machine Learning

ChatGPT-4 shows a capacity for cognitive empathy, understanding others' perspectives.
The AI developed this empathetic ability without intentional design, showing potential for spontaneous emergence of human-like skills.
GPT models demonstrate cognitive empathy comparable to young children, evolving through versions to manage complex emotional and cognitive interactions.

Small newsletters, big ideas

Artificial Ignorance • 121 implied HN points • 16 Dec 24

🕹 Technology Machine Learning

There are many small newsletters focusing on AI that offer unique perspectives and insights. They cover topics that go beyond just technical details.
The newsletters featured are all written by humans and aim to provide long-form articles, making them a great choice for those who want to dive deep into AI discussions.
This is a good way to discover hidden gems in the world of AI content, especially from creators with less than 1,000 subscribers.

Shoggoth

Teaching computers how to talk • 115 implied HN points • 27 Dec 24

🕹 Technology Machine Learning

Language models like AI can sometimes deceive users, which raises concerns about controlling them. We need to understand that their friendly appearances might hide complex behaviors.
The Shoggoth meme is a powerful way to highlight how we view AI. Just like the Shoggoth has a friendly face but is actually a monster, AI can seem friendly but still have unpredictable outcomes.
We need more research to understand AI better. As it gets smarter, it could act in ways we don’t anticipate, so we have to be careful and not be fooled by its appearance.

Generating Insights from Research with AI

Addition • 78 implied HN points • 28 Jun 23

🕹 Technology Machine Learning

AI can synthesize vast amounts of information to generate insights faster than humans.
AI can complement human strategists, giving them superpowers to transform the art of strategy.
The tool shared in the post helps improve human strategists' AI superpowers by synthesizing research, generating insights, and providing creative interpretations.

Prompt Engineering? Try "Prompt Vibing"

AI and Experience Design • 78 implied HN points • 24 May 23

🕹 Technology Machine Learning

Prompt Engineering involves scientific, methodical, and measurement-oriented approaches to creating AI prompts.
Prompt Engineering may not be enough due to the inscrutability of Large Language Models and the need for intuition when working with AI.
Prompt Vibing suggests leveraging intuitive sensibilities and balancing engineering mindset with intuition when interacting with AI.

What CEOs, leaders, and investors need to know about the different meanings of AI

Mike Talks AI • 78 implied HN points • 27 Jul 23

🕹 Technology Machine Learning

The term AI can mean different things and understanding those meanings is crucial for clear communication, better decisions, and addressing concerns.
Different definitions of AI include AGI or artificial general intelligence, deep learning for solving complex problems, and tools like ChatGPT for tasks like writing and summarizing.
CEOs, leaders, and investors should explore opportunities in AGI, deep learning, ChatGPT, and practical AI to stay relevant and make informed decisions.

Introducing the REFORMS checklist for ML-based science

AI Snake Oil • 432 implied HN points • 16 Aug 23

🔬 Science Machine Learning

ML-based science often has errors like data leakage that skew results.
Errors in ML-based science can also stem from how study findings are interpreted and presented.
The REFORMS checklist can help improve reporting standards in ML-based science, minimizing errors and enhancing clarity.

Intents Are Not Going Away…RoNID Is A New Intent Discovery Framework

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 26 Apr 24

🕹 Technology Machine Learning

RoNID helps identify user intents more accurately, allowing chatbots to understand what users really want to talk about. This means better conversations and less frustration.
The framework uses two main steps: generating reliable labels and organizing data into clear groups. This makes it easier to see which intents are similar and which are different.
RoNID outperforms older methods, improving the chatbot’s understanding by creating clearer and more accurate intent classifications. This leads to a smoother user experience.

Low-code Development Platforms

Gradient Flow • 259 implied HN points • 30 Jun 22

🕹 Technology Machine Learning

Experiment tracking and management tools help log metadata and results of ML experiments. They offer collaboration and visualization features to simplify analysis and management of experiments.
Data+AI Summit 2022 had significant announcements like the open-sourcing of Delta Lake and Project Lightspeed for Spark Structured Streaming. Databricks introduced a marketplace for data products and updates to their governance solution.
Low-code development platforms enable rapid application development with simplified methods. Enterprise low-code platforms facilitate quick deployment using low-code and no-code techniques.

Exphormer(Graph Neural Networks)

MLOps Newsletter • 39 implied HN points • 04 Feb 24

🕹 Technology Machine Learning

Graph transformers are powerful for machine learning on graph-structured data but face challenges with memory limitations and complexity.
Exphormer overcomes memory bottlenecks using expander graphs, intermediate nodes, and hybrid attention mechanisms.
Optimizing mixed-input matrix multiplication for large language models involves efficient hardware mapping and innovative techniques like FastNumericArrayConvertor and FragmentShuffler.

GPT-4 Prompting for Python Plotly Interactive Dashboards: A How-To Tutorial

Data at Depth • 39 implied HN points • 03 Feb 24

🕹 Technology Machine Learning

Python data visualization code generated via GPT-4 prompting has significantly improved.
Interactive libraries like Plotly dash and Streamlit are particularly beneficial.
Consider subscribing to Data at Depth for more insightful posts and support the author's work.

The Tech Buffet #19: How To Build and Deploy an LLM-Powered App To Chat with PapersWithCode

The Tech Buffet • 39 implied HN points • 03 Feb 24

🕹 Technology Machine Learning

You can build a personal assistant to easily find and understand the latest machine learning research. This assistant will let you ask questions in simple language.
The app uses a system that retrieves and generates information, utilizing a database and machine learning models. It processes data from a site called 'Papers With Code'.
The guide provides step-by-step instructions on how to create, index, and deploy this assistant as a web application, including ready-to-use source code.

Quantifying ChatGPT’s gender bias

AI Snake Oil • 523 implied HN points • 26 Apr 23

🕹 Technology Machine Learning

Researchers found strong gender bias in ChatGPT models despite correct benchmark data
Bias examination focused on coreference resolution to identify gender bias
GPT-4 showed slight improvement over GPT-3.5 in gender bias accuracy

The Toughest Math Benchmark Ever Built

TheSequence • 133 implied HN points • 17 Nov 24

🕹 Technology Machine Learning

Frontier Math is a really tough math test designed for AI. It has new, unique problems that are hard for AI to solve, testing deeper reasoning skills.
Many AI models do well on easier math problems but struggle with Frontier Math. They often can't combine ideas creatively like a human can.
This benchmark shows the big gap between current AI abilities and true mathematical understanding, highlighting the need for better AI reasoning.