The hottest Machine Learning Substack posts right now

And their main takeaways

Data Science Weekly - Issue 559

Data Science Weekly Newsletter • 219 implied HN points • 08 Aug 24

🕹 Technology Data science AI Machine Learning Software Development Statistics

Camera calibration is crucial in sports analysis. It helps track players' movements accurately by mapping video frame positions to real field locations.
Understanding the context of data is important for responsible data work. Datasets need good documentation and stories to highlight their historical and social backgrounds.
There's a new, free encyclopedia for learning about cognitive science. It offers easy-to-read articles on various topics for students and researchers.

In The Context Of Long Context

Adjacent Possible • 553 implied HN points • 21 Nov 24

🕹 Technology AI Machine Learning Software Development Innovation Digital Media

A new AI feature can turn a whole book into a fun audio conversation, making learning more engaging. This feature has caught a lot of attention online and even received media coverage.
The ability of the AI to handle large amounts of text—up to 1.5 million words—makes it much more useful for users, allowing for better, more detailed interactions.
Long context models can help organizations make better decisions by recalling important documents and past experiences, adding a new kind of intelligence to team discussions.

Data Science Weekly - Issue 561

Data Science Weekly Newsletter • 139 implied HN points • 22 Aug 24

🕹 Technology Data science AI Machine Learning Data Engineering Visualization

When building web applications, using Postgres for data storage is a good default choice. It's reliable and widely used.
A new study shows that agents can learn useful skills without rewards or guidance. They can explore and develop abilities just from observing a goal.
The list of important books and resources in Bayesian statistics is being compiled. It's a way to recognize influential ideas in this field.

LCM: Large Concept Model

Gonzo ML • 189 implied HN points • 04 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Natural Language Processing Data science Computational Models

The Large Concept Model (LCM) aims to improve how we understand and process language by focusing on concepts instead of just individual words. This means thinking at a higher level about what ideas and meanings are being conveyed.
LCM uses a system called SONAR to convert sentences into a stable representation that can be processed and then translated back into different languages or forms without losing the original meaning. This creates flexibility in how we communicate.
This approach can handle long documents more efficiently because it represents ideas as concepts, making processing easier. This could improve applications like summarization and translation, making them more effective.

($) DeepSeek's Three Idiosyncratic Advantages

Interconnected • 138 implied HN points • 03 Jan 25

🕹 Technology AI Open Source Machine Learning Global Competition Data science

DeepSeek-V3 is an AI model that is performing as well or better than other top models while costing much less to train. This means they're getting great results without spending a lot of money.
The AI community is buzzing about DeepSeek's advancements, but there seems to be less excitement about it in China compared to outside countries. This might show a difference in how AI news is perceived globally.
DeepSeek has a few unique advantages that set it apart from other AI labs. Understanding these can help clarify what their success means for the broader AI competition between the US and China.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Machine Learning-Assisted Directed Evolution with Bruce Wittmann

Lever • 19 implied HN points • 16 Oct 24

🔬 Science Biotechnology Machine Learning Research Methods Protein engineering

Bruce Wittmann's journey in science started from pre-med and led him to research at notable institutes like Caltech.
He worked on machine learning to improve protein engineering, building tools that can help many people in the field.
His collaboration with renowned scientists and contributions to published research highlight the exciting potential in protein design and computational biology.

Tülu 3: The next era in open post-training

Democratizing Automation • 404 implied HN points • 21 Nov 24

🕹 Technology AI Machine Learning Open Source Data science Software Development

Tulu 3 introduces an open-source approach to post-training models, allowing anyone to improve large language models like Llama 3.1 and reach performance similar to advanced models like GPT-4.
Recent advances in preference tuning and reinforcement learning help achieve better results with well-structured techniques and new synthetic datasets, making open post-training more effective.
The development of these models is pushing the boundaries of what can be done in language model training, indicating a shift in focus towards more innovative training methods.

Five Lessons for Building Robust AI Agents from Coding Agents

Tanay’s Newsletter • 56 implied HN points • 22 Jan 25

🕹 Technology AI Development Software Engineering Data Management Machine Learning

Having clear rules and structured frameworks helps AI work better. By defining specific inputs and outputs, AI can understand what to do more easily.
Using well-organized and detailed data helps AI learn faster. The more context and reasoning behind data points, the better AI can make decisions.
Measuring how well AI performs with clear goals and regular tests is important. This allows AI to keep improving and adapting to different situations.

🤘ACDC (not that one)

Gonzo ML • 63 implied HN points • 29 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Neural Networks Data Analysis Automation

The paper introduces a method called ACDC that automates the process of finding important circuits in neural networks. This can help us better understand how these networks work.
Researchers follow a three-step workflow to study model behavior, and ACDC fully automates the last step which helps identify connections that matter for a specific task.
While ACDC shows promise, it isn't perfect. It may miss some important connections and needs adjustments for different tasks to improve its accuracy.

The ghosts that live in my garage

Basta’s Notes • 122 implied HN points • 13 Jan 25

🕹 Technology Machine Learning Artificial Intelligence Self-driving cars Data science Software Development

Machine learning models are good at spotting patterns that humans might miss. This means they can make predictions and organize data in ways that are impressive and often very useful.
However, machine learning can struggle with unclear or messy data. This fuzziness can lead to mistakes, like misidentifying objects or giving unexpected results.
Not every problem needs a machine learning solution, and sometimes simpler methods work better and are more effective. It's important to think carefully about whether machine learning is truly the best tool for the job.

Data Science Weekly - Issue 558

Data Science Weekly Newsletter • 219 implied HN points • 01 Aug 24

🕹 Technology Data science Machine Learning AI Data Visualization Statistical Methods

Data science and AI are rapidly evolving fields with plenty of interesting developments. Staying updated with the latest articles and news can really help you understand these changes better.
Effective communication is key in data science. Using intuitive methods and visuals can make complex concepts easier to grasp for everyone.
Using tools and methods like quantization can help make large models more accessible. It's important to find efficient ways to work with vast amounts of data to improve performance.

Transformer^2: Self-adaptive LLMs

Gonzo ML • 63 implied HN points • 27 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Data science Computing Software Development

Transformer^2 uses a new method for adapting language models that makes it simpler and more efficient than fine-tuning. Instead of retraining the whole model, it adjusts specific parts, which saves time and resources.
The approach breaks down weight matrices through a process called Singular Value Decomposition (SVD), allowing the model to identify and enhance its existing strengths for various tasks.
At test time, Transformer^2 can adapt to new tasks in two passes, first assessing the situation and then applying the best adjustments. This method shows improvements over existing techniques like LoRA in both performance and parameter efficiency.

Data Science Weekly - Issue 560

Data Science Weekly Newsletter • 139 implied HN points • 15 Aug 24

🕹 Technology Data science AI Machine Learning Software Development Programming

The Turing Test raises questions about what it means for a computer to think, suggesting that if a computer behaves like a human, we might consider it intelligent too.
Creating a multimodal language model involves understanding different components like transformers, attention mechanisms, and learning techniques, which are essential for advanced AI systems.
A recent study tested if astrologers can really analyze people's lives using astrology, addressing the ongoing debate about the legitimacy of astrology among the public.

What Did You Think Getting Closer to AGI Would Be Like?

The Algorithmic Bridge • 318 implied HN points • 07 Dec 24

🕹 Technology Artificial Intelligence Machine Learning Software Development Computing Data science

OpenAI's new model, o1, is not AGI; it's just another step in AI development that might not lead us closer to true general intelligence.
AGI should have consistent intelligence across tasks, unlike current AI, which can sometimes perform poorly on simple tasks and excel on complex ones.
As we approach AGI, we might feel smaller or less significant, reflecting how humans will react to advanced AI like o1, even if it isn’t AGI itself.

JAX things to watch for in 2025

Gonzo ML • 378 implied HN points • 26 Nov 24

🕹 Technology AI Software Programming Data science Machine Learning

The new NNX API is set to replace the older Linen API for building neural networks with JAX. It simplifies the coding process and offers better performance options.
The shard_map feature improves multi-device computation by allowing better handling of data. It’s a helpful evolution for developers looking for precise control over their parallel computing tasks.
Pallas is a new JAX tool that lets users write custom kernels for GPUs and TPUs. This allows for more specialized and efficient computation, particularly for advanced tasks like training large models.

OpenAI Announces o1 Model And ChatGPT Pro ($200/Mo)

The Algorithmic Bridge • 329 implied HN points • 05 Dec 24

🕹 Technology AI Models Machine Learning Software Development Data science Innovation

OpenAI has launched a new AI model called o1, which is designed to think and reason better than previous models. It can now solve questions more accurately and is faster at responding to simpler problems.
ChatGPT Pro is a new subscription tier that costs $200 a month. It provides unlimited access to advanced models and special features, although it might not be worth it for average users.
o1 is not just focused on math and coding; it's also designed for everyday tasks like writing. OpenAI claims it's safer and more compliant with their policies than earlier models.

Vesuvius Challenge Progress Prizes: December Edition

Vesuvius Challenge • 31 implied HN points • 24 Jan 25

🕹 Technology Data science Machine Learning Computer Vision Community Engagement Open Source

The community is focused on improving data quality, like using better labels and refining how they categorize information. This will help them create automated tools for analyzing scrolls more effectively.
Several contributors have made significant advancements in developing new segmentation models and tools, which will help in analyzing scroll data. These innovations are key for understanding ancient texts.
2024 has been a great year for teamwork and progress as everyone shares their findings. The hard work from many people is leading to quick improvements in technology for studying historical scrolls.

Import AI 357: Facebook's open source AGI plan; Google beats humans at geometry problems; and Intel makes its GPUs better

Import AI • 2076 implied HN points • 22 Jan 24

🕹 Technology AI Research Machine Learning Machine Translation

Facebook aims to develop artificial general intelligence (AGI) and make it open-source, marking a significant shift in focus and possibly accelerating AGI development.
Google's AlphaGeometry, an AI for solving geometry problems, demonstrates the power of combining traditional symbolic engines with language models to achieve algorithmic mastery and creativity.
Intel is enhancing its GPUs for large language models, a necessary step towards creating a competitive GPU offering compared to NVIDIA, although the benchmarks provided are not directly comparable to industry standards.

Thinking Time

New World Same Humans • 42 implied HN points • 26 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Emerging Tech Digital Trends Innovation

Giving AI more time to think can greatly improve its performance, just like it helps humans think better. This 'thinking time' could be key in advancing artificial intelligence.
Being busy doesn't always mean you're being productive; it's important to take breaks and allow space for creative thinking. Sometimes the best ideas come when you're not actively working.
To truly innovate, focus on depth and originality instead of just producing a lot of work. It's about finding valuable insights that add to the conversation, rather than just adding to the noise.

Compound AI is AGI

Generating Conversation • 233 implied HN points • 13 Dec 24

🕹 Technology AI Machine Learning Software Engineering Philosophy Emerging Tech

The debate about whether we've achieved AGI (Artificial General Intelligence) is ongoing. Many people don't agree on what AGI really means, making it hard to know if we've reached it.
The argument is that current AI models can work together to perform tasks at a human-like level. This teamwork, or 'compound AI,' could be seen as a form of general intelligence, even if it's not from a single AI model.
Not all forms of intelligence are the same, and AI systems can do things that humans can’t, but that doesn't mean they can't be considered intelligent. The future potential of AI isn't just about mimicking human intellect; it may also involve different types of skills and knowledge.

Diffusion Models are Evolutionary Algorithms

Gonzo ML • 441 implied HN points • 09 Nov 24

🕹 Technology AI Algorithms Evolution Machine Learning Data science

Diffusion models and evolutionary algorithms both involve changing data over time through processes like selection and mutation, which can lead to new and improved results.
The new algorithm called Diffusion Evolution can find multiple good solutions at once, unlike traditional methods that often focus on one single best solution.
There are exciting connections between learning and evolution, hinting that they may fundamentally operate in similar ways, which opens up many questions about future AI developments.

LLMs Know More Than What They Say

LLMs for Engineers • 120 HN points • 15 Aug 24

🕹 Technology AI Machine Learning Data science Software Development Computing

Using latent space techniques can improve the accuracy of evaluations for AI applications without requiring a lot of human feedback. This approach saves time and resources.
Latent space readout (LSR) helps in detecting issues like hallucinations in AI outputs by allowing users to adjust the sensitivity of detection. This means it can catch more errors if needed, even if that results in some false alarms.
Creating customized evaluation rubrics for AI applications is essential. By gathering targeted feedback from users, developers can create more effective evaluation systems that align with specific needs.

After AI beat them, professional go players got better and more creative

Escaping Flatland • 1867 implied HN points • 23 Jan 24

🕹 Technology AI Games Creativity Open Source Machine Learning

Professional Go players improved after being beaten by AI
Creativity in Go gameplay increased post-AI, with more novel moves
Open source AI tools like Leela Zero facilitated player improvement in Go

OLMo 2 and building effective teams for training language models

Democratizing Automation • 245 implied HN points • 26 Nov 24

🕹 Technology AI Machine Learning Software Development Data science Open Source

Effective language model training needs attention to detail and technical skills. Small issues can have complex causes that require deep understanding to fix.
As teams grow, strong management becomes essential. Good managers can prioritize the right tasks and keep everyone on track for better outcomes.
Long-term improvements in language models come from consistent effort. It’s important to avoid getting distracted by short-term goals and instead focus on sustainable progress.

The Sequence Knowledge #545 : Beyond Language, Learning About Multimodal Benchmarks

TheSequence • 28 implied HN points • 20 May 25

🕹 Technology AI Machine Learning Computer Vision Data science Benchmarking

Multimodal benchmarks are tools to evaluate AI systems that use different types of data like text, images, and audio. They help ensure that AI can handle complex tasks that combine these inputs effectively.
One important benchmark in this area is called MMMU, which tests AI on 11,500 questions across various subjects. This benchmark needs AI to work with text and visuals together, promoting deeper understanding rather than just shortcuts.
The design of these benchmarks, like MMMU, helps reveal how well AI understands different topics and where it may struggle. This can lead to improvements in AI technology.

How to succeed as a Machine Learning Engineer

The ML Engineer Insights • 359 implied HN points • 22 Jun 24

🕹 Technology Machine Learning Career development Professional Skills Networking Mentorship

Building a strong foundation in machine learning fundamentals and staying updated with the latest research are crucial for success as a Machine Learning Engineer.
Playing to your strengths, such as data and feature engineering, modeling, and deployment scalability, is key. Seek help in areas where you're less experienced.
Focus on aligning your work with business goals, understanding trade-offs, ROI, and embracing experimentation. Continuous learning, networking, and mentorship are invaluable.

Belief in magic may be declining

Marcus on AI • 3122 implied HN points • 22 Feb 24

🕹 Technology AI Machine Learning Technology News

Belief in magic may be declining among the public.
There are doubts surrounding the effectiveness and promises of LLMs in the industry.
Concerns exist about the capability and reliability of AI technologies in handling basic tasks.

The Sequence Radar #477: The R1 Moment

TheSequence • 546 implied HN points • 26 Jan 25

🕹 Technology AI Machine Learning Open Source Innovation Data science Research

DeepSeek-R1 is a new AI model that shows it can perform as well or better than big-name AI models but at a much lower cost. This means smaller companies can now compete in AI innovation without needing huge budgets.
The way DeepSeek-R1 is trained is different from traditional methods. It uses a new approach called reinforcement learning, which helps the model learn smarter reasoning skills without needing a ton of supervised data.
The open-source nature of DeepSeek-R1 means anyone can access and use the code for free. This encourages collaboration and allows more people to innovate in AI, making technology more accessible to everyone.

Qualcomm’s Cloud AI 100 PCIe: Now For All

More Than Moore • 93 implied HN points • 06 Jan 25

🕹 Technology AI hardware Cloud Computing Machine Learning Embedded Systems Data processing

Qualcomm's Cloud AI 100 PCIe card is now available for the wider embedded market, making it easier to use for edge AI applications. This means businesses can run AI locally without relying heavily on cloud services.
There are different models of the Cloud AI 100, offering various compute powers and memory capacities to suit different business needs. This flexibility helps businesses select the right fit based on how much AI processing they require.
Qualcomm is keen to support partnerships with OEMs to build appliances that use their AI technology, but they are not actively marketing it widely. Interested users are encouraged to reach out directly for collaboration opportunities.

Has Sam Altman gone full Gary Marcus?

Marcus on AI • 4624 implied HN points • 16 Nov 23

🕹 Technology AI Deep Learning Artificial General Intelligence Machine Learning

In the midst of an AI boom, scale isn't everything, and there are still unresolved issues.
Recognition is growing that scoring well on benchmarks doesn't mean true foundational progress.
Tech leaders like Sam Altman are acknowledging the limitations of deep learning and considering new paradigms.

Open Thread 311

Astral Codex Ten • 5574 implied HN points • 15 Jan 24

🕹 Technology AI AI Alignment Machine Learning

Weekly open thread for discussions and questions on various topics.
AI art generators still have room for improvement in handling tough compositionality requests.
Reminder about the PIBBSS Fellowship, a fully-funded program in AI alignment for PhDs and postdocs from diverse fields.

Quant Letter: January 2025, Week-4

The Parlour • 34 implied HN points • 23 Jan 25

💰 Finance Quantitative Machine Learning Portfolio Management Market Analysis Investment Strategies

Advanced models like the MDQR help understand market dependencies, which can make it easier for traders to create effective strategies.
New methods for portfolio optimization can handle many assets at once, moving beyond the traditional limits that were previously in place.
Research shows AI can effectively forecast financial risks and rewards, highlighting the growing importance of technology in finance.

Attention Explained: When to use Self, Graph, and Target-Aware Attention

Recommender systems • 16 implied HN points • 25 May 25

🕹 Technology AI Machine Learning Data science Computer Science

Self-attention helps summarize a list of information, making it easier to find what's most relevant, like recent videos you watched.
Graph attention looks at how items in a network relate to each other, like understanding social connections in a network.
Target-aware attention checks how relevant certain items are based on your past choices or queries, helping improve recommendations.

“Math is hard” — if you are an LLM – and why that matters

Marcus on AI • 4782 implied HN points • 19 Oct 23

🔬 Science Mathematics Artificial Intelligence Machine Learning

Even with massive data training, AI models struggle to truly understand multiplication.
LLMs perform better in arithmetic tasks than smaller models like GPT but still fall short compared to a simple pocket calculator.
LLM-based systems generalize based on similarity and do not develop a complete, abstract, reliable understanding of multiplication.

What Does Hitting Scaling Law Limit Mean for US-China AI Competition

Interconnected • 246 implied HN points • 18 Nov 24

🕹 Technology AI Machine Learning International relations Global Competition Data science

The scaling law for AI models might be losing effectiveness, meaning that simply using more data and compute power may not lead to significant improvements like it did before.
US export controls on AI technology may become less impactful over time, as diminishing returns on AI model scaling could lessen the advantages of having the most advanced hardware.
If AI development slows down, the urgency for a potential 'AI doomsday' scenario may decrease, allowing for a more balanced competition between the US and China in AI advancements.

Data Science Weekly - Issue 557

Data Science Weekly Newsletter • 159 implied HN points • 25 Jul 24

🕹 Technology Data science AI Machine Learning Data Visualization Engineering

AI models can break down when trained on data that is generated by other models. This can cause problems in how well they work.
There is scientific research about the history of Italian filled pasta. It shows that most types likely came from a single area in northern Italy.
There are new resources and guides available for improving predictive modeling with tabular data. These can help you build better models by focusing on how data is represented.

Analog Chip Design is an Art. Can AI Help?

The Asianometry Newsletter • 2707 implied HN points • 12 Feb 24

🕹 Technology AI Machine Learning Neural Networks

Analog chip design is a complex art form that often takes up a significant portion of the total design cost of an integrated circuit.
Analog design involves working with continuous signals from the real world and manipulating them to create desired outputs.
Automating analog chip design with AI is a challenging task that involves using machine learning models to assist in tasks like circuit sizing and layout.

Code Clinic | Orchestrating Transformers Agents 2.0 for Internet Search

Encyclopedia Autonomica • 19 implied HN points • 09 Oct 24

🕹 Technology AI Software Machine Learning Data science Programming

Using Transformer Agents 2.0 is a step up from traditional methods. They can handle multi-step tasks better and have memory to store information as they work.
Setting up and building a basic ReAct Agent is straightforward. You only need to install some packages and create the agent using selected models and tools.
You can orchestrate multiple agents together for more complex tasks. By combining different agents, you can enhance their capabilities and improve the results of your searches or queries.

Has Google gone too woke? Why even the biggest models still struggle with guardrails

Marcus on AI • 2608 implied HN points • 21 Feb 24

🕹 Technology AI Ethics Data Bias Machine Learning

Google's large models struggle with implementing proper guardrails, despite ongoing investments and cultural criticisms.
Issues like presenting fictional characters as historical figures, lacking cultural and historical accuracy, persist with AI systems like Gemini.
Current AI lacks the ability to understand and balance cultural sensitivity with historical accuracy, showing the need for more nuanced and intelligent systems in the future.

Amazon Anthropic: Poison Pill or Empire Strikes Back

SemiAnalysis • 6667 implied HN points • 02 Oct 23

🕹 Technology AI Cloud Computing Machine Learning Artificial Intelligence

Amazon and Anthropic signed a significant deal, with Amazon investing in Anthropic, which could impact the future of AI infrastructure.
Amazon has faced challenges in generative AI due to lack of direct access to data and issues with internal model development.
The collaboration between Anthropic and Amazon could accelerate Anthropic's ability to build foundation models but also poses risks and challenges.