The hottest Data science Substack posts right now

And their main takeaways

Data Science Weekly - Issue 524

Data Science Weekly Newsletter • 299 implied HN points • 08 Dec 23

Data engineering is evolving with new design patterns that help improve efficiency in handling data. A new book dives into these patterns and their importance.
Machine learning is being used to understand and control the movement of silicon atoms in materials, which could lead to advancements in technology like better electronics.
A new model called PoseGPT can estimate 3D human poses from images and text, linking physical movements to broader concepts about humans, showing the capabilities of large language models.

The Sequence Knowledge #468: A New Series About RAG

TheSequence • 84 implied HN points • 13 Jan 25

🕹 Technology Artificial Intelligence Machine Learning Data science Natural Language Processing Research

Retrieval Augmented Generation, or RAG, helps AI models use outside information to improve their answers. This makes the responses more accurate and relevant.
RAG works in two steps: first, it finds useful information, and then it uses that information to create better responses. This method is great for applications that need quick and correct answers.
A key paper introduced RAG and showed that combining different types of memory can lead to better results in language tasks, like answering questions or generating text.

In the land of LLMs, can we do better mock data generation?

Neurelo Engineering’s Substack • 1 HN point • 27 Sep 24

🕹 Technology Software Development Data science Programming Languages Artificial Intelligence Machine Learning

Mock data is super useful for testing software, but it hasn't really improved much over the years. It needs to be more flexible and easier to generate high-quality data.
Using LLMs (large language models) can be tricky for creating mock data. Instead of trying to generate everything, it’s often better to use techniques like topological sorting to keep relationships correct between data entries.
A new approach is turning to strategies like the Genesis Point Strategy, which helps create unique mock data efficiently. It shows that you can simplify processes to get good results without overcomplicating things.

Why Today Is The Perfect Time to Learn Data | Seattle Data Guy

Data Analysis Journal • 373 implied HN points • 25 Oct 23

🕹 Technology Data science Analytics Data Engineering Learning Resources Career Advice

Learning data is more accessible and better now than in the past years.
For transitioning into data engineering, focus on SQL, programming, data warehouse, and data pipelines.
Analysts should focus on understanding the business problem, building maintainable systems, and following a data analytics process.

The Sequence Radar #472: Remember this Name: Ndea

TheSequence • 77 implied HN points • 19 Jan 25

🕹 Technology AI Research Startups Innovation Machine Learning Data science

Ndea is a new AI lab aiming to create artificial general intelligence (AGI) with a unique approach called guided program synthesis. This approach allows models to learn efficiently from fewer examples.
Francois Chollet, a well-known AI expert, is leading Ndea. He believes current deep learning methods have limitations and wants to explore new ideas for better AI development.
The goal of Ndea is to drive quick scientific advancements by combining program synthesis with deep learning, aiming to tackle tough challenges and possibly discover new scientific frontiers.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

To Use Or Not To Use [HTC #69]

Hold the code • 4 implied HN points • 30 May 25

🕹 Technology AI Mental health Innovation Data science

Tech buzzwords are often just fancy terms that can make simple ideas sound more complex. It's easy to use these words to impress people but they can confuse others.
AI is increasingly being used as a therapist because it's accessible and can provide immediate support, but it should not replace real human therapists, who understand emotions better.
The term 'artificial intelligence' is becoming vague and companies often use it to make their products sound smarter, even if they aren't truly intelligent. This can mislead the public about what AI can really do.

A Pain in the Plate Maps

Briefly Bio • 198 implied HN points • 23 Feb 24

🕹 Technology Biotechnology Data science Software Development Research Methodology

Creating 96-well plate maps is important for organizing samples and tracking metadata during scientific experiments. This helps scientists during pipetting and later data analysis.
Current methods for making plate maps, like using spreadsheets, can be clunky and error-prone as they often require managing multiple tables that are not linked.
A new visual plate mapper allows for easy creation and editing of plate maps. It synchronizes the visual layout with a data table, making it simpler to manage and analyze experiment data.

The true information is in the words, not the numbers

Klement on Investing • 4 implied HN points • 29 May 25

💰 Finance Investing Market Analysis Data science AI Applications

Analyst recommendations are often seen as unreliable, especially when a 'Hold' is viewed like a 'Sell'. People are starting to see more value in the actual words analysts use rather than just the numbers they give.
AI has been used to analyze over a million analyst reports, revealing that most discussions focus on profitability. However, during tough times, there's less talk about profitability and more on financial stability.
It turns out that the specific language analysts use can help predict changes in earnings and stock prices, showing that understanding their words might be more valuable than just following their price forecasts.

Building LLM-powered Apps: What You Need to Know

Gradient Flow • 519 implied HN points • 06 Apr 23

🕹 Technology AI Machine Learning Data science Applications Models

Developers can now create AI-powered applications without deep machine learning knowledge, opening up opportunities for rapid experimentation and innovation.
Building custom large language models (LLMs) is becoming more accessible through startups offering resources for model fine-tuning or training from scratch.
Integration of custom LLMs with third-party services, utilizing knowledge bases, and serving models efficiently are key areas of focus for developers in the AI application space.

No sacred masterpieces

Basta’s Notes • 753 HN points • 15 Sep 23

🕹 Technology Software Development Data science Engineering Project management Web Development

Sometimes, valuable projects end abruptly without much recognition or lasting impact.
It's important to focus on creating business value with your work, rather than building impressive but ultimately unnecessary solutions.
Every piece of code you write as an engineer is legacy and may not last forever, so focus on learning from each project's outcome.

What are embeddings?

Normcore Tech • 1353 implied HN points • 07 Jun 23

🕹 Technology Deep Learning Neural Networks NLP Research Data science

The author delved deep into the concept of embeddings in deep learning.
The author's journey in understanding embeddings involved a significant amount of research and work.
The author hopes that others can benefit from their learning about embeddings as well.

Infinite Context Length 🤯

Sector 6 | The Newsletter of AIM • 99 implied HN points • 18 Apr 24

🕹 Technology AI Neural Networks Software Development Data science Machine Learning

Meta has introduced MEGALODON, a new neural architecture that allows for infinite context length in AI, making it more efficient than previous models.
With developments from Microsoft, Google, and Meta, the focus will shift away from which model has the highest context length, as all will likely have infinite capabilities soon.
The upcoming Llama-3 model is expected to continue this trend by also supporting infinite context length, enhancing its utility in various applications.

2024 Favorites, WebGPU Crash Course, and Speculative Size

Generative Arts Collective • 92 implied HN points • 05 Jan 25

🕹 Technology Creative Coding Generative Arts Web Development Data science

Keep learning and being creative. Even small habits can make a big difference in your life.
Explore new tools and techniques in creative coding, like WebGPU, to enhance your projects.
Participate in community events related to generative arts to connect with others and gain new insights.

RAG Survey & Available Research

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 27 Jun 24

🕹 Technology AI Machine Learning Natural Language Data science Deep Learning

Retrieval-Augmented Generation (RAG) mixes retrieval methods with learning systems to help large language models use real-time data.
RAG can enhance the accuracy of language models by incorporating current information, avoiding wrong answers that might come from outdated knowledge.
The framework of RAG includes steps like pre-retrieval, retrieval, post-retrieval, and generation, each contributing to better outputs in language processing tasks.

Stable Point Aware 3D, Cosmos, Autonomous game characters and Digits by Nvidia, Qwen Chat, Hailuo's Subject Reference, rStar-Math, Text-to-Video gen with Transparency, Cohere's North, STAR, & more

AI Brews • 12 implied HN points • 10 Jan 25

🕹 Technology AI Software Game Development Open Source Data science

Stability AI has released a new tool called Stable Point Aware 3D, which lets you edit 3D objects from just one image really quickly. It's free to use for everyone.
Microsoft has made its Phi-4 model open-source and introduced rStar-Math, a new technique that improves math solving in smaller language models.
Qwen Chat is a new web app allowing users to interact with various Qwen models, making it easy to compare their capabilities all in one place.

TinyStories

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 26 Jun 24

🕹 Technology AI Machine Learning Natural Language Data science User Experience

Phi-3 is a small language model that uses a special dataset called TinyStories. This dataset was designed to help the model create more varied and engaging stories.
TinyStories uses simple vocabulary suitable for young children, focusing on quality over quantity. The stories generated are meant to be both understandable and entertaining.
Training the Phi-3 model with TinyStories can be done quickly and allows for easier fine-tuning. This helps smaller organizations use advanced language models without needing huge resources.

Data Science Weekly - Issue 513

Data Science Weekly Newsletter • 359 implied HN points • 21 Sep 23

🕹 Technology AI Data science Machine Learning Software Engineering Tech startups

There's a new newsletter focusing on AI safety in China, showing that the country is more invested in AI safety than many think.
A podcast discusses how startups can run better AI models without needing to upgrade their hardware—a big challenge in the field.
An online event is coming up for those looking to secure data science jobs in big tech, focusing on interview strategies and market insights.

Call for Questions

Human Capitalist • 99 implied HN points • 07 May 24

🕹 Technology Data science Artificial Intelligence Job Market Business Intelligence

There are a lot of unanswered questions about the workforce that data can help with. This could give businesses valuable insights into hiring trends and job market changes.
A partnership with Seek.ai will allow people to ask real-time questions about workforce data. This means anyone can get important answers quickly, helping them make better decisions.
The team is looking for creative questions to test their new analytics tool. People can submit their questions, and the most interesting ones will be selected for special insights.

Better Data Science: How To Design Visualizations that Work

Data at Depth • 79 implied HN points • 05 May 24

🕹 Technology Data science Visualization Accessibility Aesthetics Design principles

Start with defining the function you want the audience to perform with the presented data before creating visualizations that support it
Implement aspects like affordances, accessibility, and aesthetics to ensure your visualizations are clear, usable, and visually appealing for the audience
Achieving acceptance of your data visualization involves following established design principles like direct labeling, thoughtful use of color, alignment, and the data-ink principle

Data Science Weekly - Issue 537

Data Science Weekly Newsletter • 139 implied HN points • 07 Mar 24

🕹 Technology Data science AI Machine Learning Data Engineering Data Visualization

The newsletter shares valuable links about Data Science, AI, and Machine Learning each week. It's a great way to keep updated on the latest in the field.
There are interesting articles highlighting statistical analyses and practical guides, like building GPU clusters at home. These resources help both beginners and experienced practitioners learn more.
The newsletter also encourages people to participate in AI-related events and offers resources for job seekers. This can help you connect with others and grow your career.

Data Science Weekly - Issue 517

Data Science Weekly Newsletter • 339 implied HN points • 19 Oct 23

🕹 Technology Data science AI Machine Learning Data Visualization Engineering

Data science, AI, and ML are rapidly evolving fields, with new technologies and techniques emerging frequently. Staying updated through news and articles can help professionals keep their skills relevant.
Fine-tuning large language models (LLMs) is a growing demand in the job market. Many companies are now looking for experience with LLMs alongside traditional skills like Python and SQL.
Understanding different data visualization goals, like storytelling versus exploration, is important for effectively communicating data insights. This can improve how data is presented in reports and analyses.

Is AI Capex Worth the Money? (ft. Goldman Sachs)

Enterprise AI Trends • 337 implied HN points • 11 Jul 24

🕹 Technology AI Cloud Computing Data science Enterprise Software Investment

AI spending is still worth it because it can help big cloud providers move data to their services. This could open up a big opportunity for revenue, making the investment seem less risky.
Most of the useful AI work happens behind the scenes and isn't visible to the public. This means many people might underestimate how much AI is actually helping businesses already.
Companies are really committed to using generative AI and are treating it as a top priority. This commitment means we'll likely see more successful projects in the future.

Introduction To Analytics Engineering

Data Analysis Journal • 353 implied HN points • 22 Mar 23

🕹 Technology Data Analysis Data science SQL Data Engineering

Analytics engineers bridge the gap between data engineers and data analysts by focusing on producing high-quality data.
Analytics engineers use tools like dbt to streamline data modeling, testing, and documentation.
Data quality is crucial in decision-making, making analytics engineering more important than ever.

Edge 455: Building Smaller Foundation Models Using Graph-Based Distillation

TheSequence • 105 implied HN points • 10 Dec 24

🕹 Technology Artificial Intelligence Machine Learning Data science Software Development Graph Theory

Graph-based distillation helps smaller models learn better by using the connections between data points. Instead of just focusing on individual data, it looks at how they relate to one another.
This technique uses attention networks to improve how student models understand data, making them more effective in learning.
There’s a new framework called Hugging Face Autotrain that allows for easier training of foundation models without needing too much coding knowledge.

Data Science Weekly - Issue 509

Data Science Weekly Newsletter • 399 implied HN points • 25 Aug 23

🕹 Technology Data science Machine Learning Artificial Intelligence Data Engineering Data Visualization Software Development

Each week, a newsletter shares important links and articles about data science, machine learning, and AI. It's a good way to keep updated on new happenings in the field.
The newsletter features articles on various topics, including programming, AI forecasting, and data management practices. These articles are meant to help both newcomers and experienced professionals.
Job listings and training resources are also provided, helping readers find opportunities and learn new skills beneficial for their careers in data science.

The Sequence Knowledge #487: A RAG that Assesses Itself

TheSequence • 49 implied HN points • 11 Feb 25

🕹 Technology AI Software Machine Learning Data science Innovation

Self-RAG is a new method that helps improve how retrieval-augmented generation works by letting models check their own work.
It uses special tokens that help the model decide when it should look for information and how to review its own answers.
This technique aims to make the process more thoughtful compared to regular methods that just pull information randomly.

Weekly Dose of Optimism #120

Not Boring by Packy McCormick • 137 implied HN points • 15 Nov 24

🕹 Technology AI Genomics Energy Nanotechnology Data science

The U.S. is planning to triple its nuclear power capacity by 2050, aiming for 200 gigawatts through new reactors and upgrades. This is a big move to meet rising energy demands in a safe and efficient way.
Molecular nanotechnology could revolutionize production, possibly outpacing past technological shifts like the Industrial Revolution. It's an exciting frontier that stands to vastly increase our capabilities in various fields.
Evo, a new AI model, shows promise in predicting and designing genomes, potentially creating new life forms. This technology could push the boundaries of biological science and genetic engineering significantly.

The Best Skillsets to Learn in 2024 for Generative AI

Rod’s Blog • 238 implied HN points • 15 Dec 23

🕹 Technology AI Programming Machine Learning Data science Resources

Generative AI is a rapidly evolving field creating novel content like images, text, music, etc., with real-world applications from enhancing creativity to helping solve problems.
To succeed in generative AI, you need skills like mathematics and statistics, programming, data science, knowledge of generative AI methods, and creativity in your specific domain.
To learn generative AI in 2024, leverage online courses, books, blogs, tools, and engage in communities and events dedicated to this field.

Data Science Weekly - Issue 514

Data Science Weekly Newsletter • 339 implied HN points • 29 Sep 23

🕹 Technology Data science Machine Learning Artificial Intelligence Data Visualization Data Engineering

Data science involves a mix of techniques for analyzing and visualizing data which can help make informed decisions.
Learning about advanced customer segmentation methods can enhance how businesses understand and target their customers.
There are various roles in data-related careers beyond just being a data scientist, so it's good to explore different paths.

Data Science Weekly - Issue 519

Data Science Weekly Newsletter • 299 implied HN points • 03 Nov 23

🕹 Technology Data science AI Machine Learning Data Engineering Tech news Data Visualization

Companies are increasingly sharing their advanced AI models openly, which can help them improve and build better products. This open sharing can lead to a more cooperative tech environment.
Data science job applications are extremely competitive, with many positions receiving thousands of applicants within a day. This shows a high interest and demand in the data science field.
Exploring advanced tools and frameworks in AI can be complex, but understanding how they work can help in building effective applications, especially in question-answering systems.

RAG Implementations Are Becoming More Agent-Like

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 99 implied HN points • 08 Apr 24

🕹 Technology AI Development Software Engineering Data science Machine Learning Automation

RAG implementations are changing to become more like agents, which means they can make better decisions and adapt to different situations.
The structure of prompts is really important now; it’s not just about adding data, but about crafting the prompts to improve how they perform.
Agentic RAG allows for complex tasks by using multiple tools together, making it capable of handling detailed questions that standard RAG cannot.

Data Science Weekly - Issue 522

Data Science Weekly Newsletter • 259 implied HN points • 23 Nov 23

🕹 Technology Data science Artificial Intelligence Machine Learning Software Development Cloud Computing

This newsletter shares weekly interesting links and updates in data science, AI, and machine learning. It's a great way to stay informed about new developments in these fields.
There's a focus on practical tools and techniques for improving data science work, like using cloud processing for large datasets and methods for fine-tuning AI models effectively.
The newsletter also highlights job opportunities and resources for those looking to enter or advance in the data science industry. It's beneficial for anyone looking to grow their career in this area.

The Sequence Research #466: Small but Migthy, Diving Into Microsoft Phi-4

TheSequence • 70 implied HN points • 10 Jan 25

🕹 Technology AI Software Data science Research Innovation

Microsoft's Phi-4 is a new language model that's smaller in size but powerful in performance. It shows that high-quality data can make a big difference in AI.
Phi-4 has 14 billion parameters, which means it can handle complex language tasks effectively. This model builds on the success of earlier Phi models.
The innovations in Phi-4 come from its unique approach to training, focusing on pre-training, mid-training, and post-training stages to enhance its capabilities.

Data Science Weekly - Issue 508

Data Science Weekly Newsletter • 379 implied HN points • 18 Aug 23

🕹 Technology Data science Machine Learning Artificial Intelligence Data Visualization Data Engineering

Writing clear and effective research papers is essential, and there are tips specifically for NLP papers that can help improve your writing skills.
The job market for data-related roles has changed over the years, and analyzing hiring trends can provide insights into what skills and positions are in demand.
Understanding AI hardware is important because it forms the backbone of many AI models, and knowing how it works can help in making better tech decisions.

Alibaba QwQ Really Impresses at GPT-o1 Levels

TheSequence • 105 implied HN points • 01 Dec 24

🕹 Technology AI Models Machine Learning Data science Generative AI Open Source

Alibaba's new AI model called QwQ is doing really well in reasoning tasks, even better than some existing models like GPT-o1. This shows that it's becoming a strong competitor in the AI field.
QwQ is designed to think carefully and explain its reasoning step by step, making it easier for people to understand how it reaches its conclusions. This transparency is a big deal in AI development.
The rise of models like QwQ indicates a shift towards focusing on reasoning abilities, rather than just making models bigger. This could lead to smarter AI that can learn and solve problems more effectively.

Phi-3 Is A Small Language Model Which Can Run On Your Phone

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 19 Jun 24

🕹 Technology AI NLP Machine Learning Data science Software Development

Phi-3 is a small language model that can run directly on your phone, making it accessible for local use instead of needing cloud connections. This means you can use it anywhere without relying on internet speed.
Small language models like Phi-3 are good for specific tasks and regulated industries where data privacy is important. They can provide quick and accurate responses while keeping your data secure.
Training for Phi-3 involves using high-quality data to improve its understanding of language and reasoning skills, allowing it to perform well on par with larger models, despite its smaller size.

Data Science Weekly - Issue 506

Data Science Weekly Newsletter • 399 implied HN points • 04 Aug 23

🕹 Technology Data science Machine Learning Artificial Intelligence Robotics Data Engineering

Integrating large language models into systems can be done using seven key patterns that balance performance and cost.
Ethics in AI isn't just about explainability and fairness; we need a deeper understanding to prevent overall harm from AI systems.
New approaches in robotics focus on current challenges and opportunities while advancing understanding of AI's role in planning tasks.

The Large Language Model Landscape — Version 5

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 79 implied HN points • 25 Apr 24

🕹 Technology AI Machine Learning Software Development Natural Language Processing Data science

Large Language Models (LLMs) are evolving with more functionality, combining various tasks into fewer models. This helps in making them more efficient for users.
There are different zones in the LLM landscape, each focusing on specific uses, tools, and applications, ranging from available models to user interfaces.
Tech advancements like prompt engineering and data-centric tools are making it easier to harness the power of LLMs, opening up new opportunities for businesses.

Google announces AI system for diagnostic medical reasoning and conversation

MLOps Newsletter • 176 implied HN points • 20 Jan 24

🕹 Technology AI Machine Learning Data science Software Development

Google announced an AI system for medical diagnosis and conversation called AMIE.
AMIE's architecture includes multi-turn dialogue management, hierarchical reasoning model, and modular design.
The AI system AMIE showed promising performance in simulated diagnostic conversations, outperforming PCPs and matching specialist physicians.

The Roundup of Blogs and Newsletters About Analytics - Issue 133

Data Analysis Journal • 314 implied HN points • 22 Feb 23

🕹 Technology Data Analysis Data science Product Analytics SaaS

The post discusses a roundup of blogs and newsletters about analytics.
It highlights key articles on adjacent users measurement, ML in product analytics, and SQL case statements.
Various expert blogs and newsletters are recommended for analysts, data practitioners, and anyone interested in data and analytics.