The hottest Data science Substack posts right now

And their main takeaways

Better Data Science: How To Design Visualizations that Work

Data at Depth • 79 implied HN points • 05 May 24

Start with defining the function you want the audience to perform with the presented data before creating visualizations that support it
Implement aspects like affordances, accessibility, and aesthetics to ensure your visualizations are clear, usable, and visually appealing for the audience
Achieving acceptance of your data visualization involves following established design principles like direct labeling, thoughtful use of color, alignment, and the data-ink principle

The Sequence Chat: Arjun Sethi on Venture Investing in Generative AI

TheSequence • 1211 implied HN points • 10 Jan 24

🕹 Technology AI Venture Capital Generative AI Data science

Tribe Capital uses data science and AI for successful venture capital performance.
Successful investments in generative AI focus on product-market fit and distribution advantages.
The future of generative AI will see coexistence of open-source and closed-source distribution models.

Weekly Top Picks #67

The Algorithmic Bridge • 116 implied HN points • 18 Mar 24

🕹 Technology AI Tech news Artificial Intelligence Software Engineering Data science

The post discusses Nvidia GTC keynote, BaaS in science, Apple's potential collaboration with Google Gemini, and more key AI topics of the week.
It features conversations between Sam Altman and Lex Friedman, touches on jobs in the AI era, and examines the response from NYT to OpenAI.
There's a question about whether OpenAI's Sora model is trained using YouTube videos, among other intriguing topics.

Do Not Spend too Much Time "Getting Good" at Dealing with Current AML GPT LLMs

Brad DeLong's Grasping Reality • 169 implied HN points • 04 Mar 24

🕹 Technology Machine Learning Artificial Intelligence Data science Internet Software

It's uncertain how current AML GPT LLMs will be most useful in the future, so spending too much time trying to master them may not be the best approach.
Proper prompting is crucial when working with AML GPT LLMs as they can be capable of more than initially apparent. Good prompts can make tasks that seem impossible into achievable ones.
Understanding the thought processes and effective way to prompt AML GPT LLMs is essential, as their responses can vary based on subtle changes or inadequate prompting.

How To Measure Data Quality - Issue 185

Data Analysis Journal • 235 implied HN points • 07 Feb 24

🕹 Technology Data science Analytics Data Quality Data Governance Metrics

Data quality metrics are essential for measuring data governance and analytics success.
There is no industry standard for defining poor-quality data; it varies based on context.
Having specific KPIs for data quality is crucial to scale data governance initiatives and improve the state of data quality.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Behind The Screens Of Data Science At DoorDash | Daniel Parris

Data Analysis Journal • 294 implied HN points • 24 Jan 24

🕹 Technology Data science Analytics AI Consultancy Newsletter

AI will fundamentally change data science by automating tasks and emphasizing building AI models
Consider the end user when launching a project to avoid overlooking usability issues
Start with quality data; building fancy models isn't as important as clean, workable data for end users

What AI can do with a toolbox... Getting started with Code Interpreter

One Useful Thing • 1338 implied HN points • 07 Jul 23

🕹 Technology AI Data Analysis Automation Machine Learning Data science

Code Interpreter by OpenAI democratizes data analysis with advanced AI tools
Code Interpreter decreases errors by working directly with Python code
Code Interpreter allows for versatile problem-solving with AI writing Python code

14 Charts That Tell the Story of AI Right Now

Newcomer • 1474 implied HN points • 05 Jun 23

🕹 Technology AI Data science GitHub Venture Capital

OpenAI and Anthropic are leading in large language model rankings.
Anthropic offers more memory tokens than OpenAI for better conversation sustainability.
Auto-GPT is the most popular repository on GitHub for AI projects.

No sacred masterpieces

Basta’s Notes • 753 HN points • 15 Sep 23

🕹 Technology Software Development Data science Engineering Project management Web Development

Sometimes, valuable projects end abruptly without much recognition or lasting impact.
It's important to focus on creating business value with your work, rather than building impressive but ultimately unnecessary solutions.
Every piece of code you write as an engineer is legacy and may not last forever, so focus on learning from each project's outcome.

What are embeddings?

Normcore Tech • 1342 implied HN points • 07 Jun 23

🕹 Technology Deep Learning Neural Networks NLP Research Data science

The author delved deep into the concept of embeddings in deep learning.
The author's journey in understanding embeddings involved a significant amount of research and work.
The author hopes that others can benefit from their learning about embeddings as well.

The Great Business Dying: Why AI Threatens Half Of All Businesses & What To Do About It.

High ROI Data Science • 157 implied HN points • 30 Jan 24

💼 Business AI Digital Transformation Data science Business strategy

Businesses need to move fast in adapting to AI or risk being disrupted.
Data and AI strategies must focus on getting buy-in and overcoming resistance from business leaders.
Businesses must generate incremental value from technology investments to avoid becoming cost centers.

Google announces AI system for diagnostic medical reasoning and conversation

MLOps Newsletter • 176 implied HN points • 20 Jan 24

🕹 Technology AI Machine Learning Data science Software Development

Google announced an AI system for medical diagnosis and conversation called AMIE.
AMIE's architecture includes multi-turn dialogue management, hierarchical reasoning model, and modular design.
The AI system AMIE showed promising performance in simulated diagnostic conversations, outperforming PCPs and matching specialist physicians.

Edge 372: Learn About CALM, Google DeepMind's Method to Augment LLMs with Other LLMs

TheSequence • 98 implied HN points • 22 Feb 24

🕹 Technology Artificial Intelligence Machine Learning Data science Research

Knowledge augmentation is crucial in LLM-based applications with new techniques constantly evolving to enhance LLMs by providing access to external tools or data.
Exploring the concept of augmenting LLMs with other LLMs involves merging general-purpose anchor models with specialized ones to unlock new capabilities, such as combining code understanding with language generation.
The process of combining different LLMs might require additional training or fine-tuning of the models, but can be hindered by computational costs and data privacy concerns.

How do transformers work?+Design a Multi-class Sentiment Analysis for Customer Reviews

The ZenMode • 134 HN points • 04 Feb 24

🕹 Technology AI NLP Machine Learning Coding Data science

Transformers are crucial in AI for tasks like natural language processing.
The encoder dissects the input text and uncovers hidden connections, while the decoder crafts the output.
Transformers employ layers like self-attention, multi-head attention, and masked self-attention for processing text.

The Best Skillsets to Learn in 2024 for Generative AI

Rod’s Blog • 238 implied HN points • 15 Dec 23

🕹 Technology AI Programming Machine Learning Data science Resources

Generative AI is a rapidly evolving field creating novel content like images, text, music, etc., with real-world applications from enhancing creativity to helping solve problems.
To succeed in generative AI, you need skills like mathematics and statistics, programming, data science, knowledge of generative AI methods, and creativity in your specific domain.
To learn generative AI in 2024, leverage online courses, books, blogs, tools, and engage in communities and events dedicated to this field.

Edge 364: About COSP and USP: Two New LLM Reasoning Methods Built by Google Research

TheSequence • 133 implied HN points • 25 Jan 24

🕹 Technology AI Language Models Research Machine Learning Data science

Two new LLM reasoning methods, COSP and USP, have been developed by Google Research to enhance common sense reasoning capabilities in language models.
Prompt generation is crucial for LLM-based applications, and techniques like few-shot setup have reduced the need for large amounts of data to fine-tune models.
Models with robust zero-shot performance can eliminate the need for manual prompt generation, but may have less potent results due to operating without specific guidance.

Why Today Is The Perfect Time to Learn Data | Seattle Data Guy

Data Analysis Journal • 373 implied HN points • 25 Oct 23

🕹 Technology Data science Analytics Data Engineering Learning Resources Career Advice

Learning data is more accessible and better now than in the past years.
For transitioning into data engineering, focus on SQL, programming, data warehouse, and data pipelines.
Analysts should focus on understanding the business problem, building maintainable systems, and following a data analytics process.

Where do we go from here

Normcore Tech • 1145 implied HN points • 28 Feb 23

🕹 Technology Social media Artificial Intelligence Data science Tech Trends Personal Projects

The landscape of social media is changing with platforms like Twitter and Facebook losing users to newer platforms like TikTok
Users are moving to private, fragmented social media landscapes with platforms like Discord and Mastodon
Creators are facing challenges in standing out in the mass-creation of art facilitated by tools like ChatGPT and StableDiffusion

How To Run An A/B Testing On Low Traffic - Issue 181

Data Analysis Journal • 137 implied HN points • 10 Jan 24

🕹 Technology Data science Analytics A/B Testing Experimentation Statistical Analysis

No specific rules on when to start A/B testing or the minimum user numbers required.
Consider adjusting thresholds when experimenting with small sample sizes.
Address factors like confidence levels and test timelines for effective decision-making.

DBRX: Revolutionizing Language Models for the Open Community

Data Plumbers • 19 implied HN points • 04 Apr 24

🕹 Technology Artificial Intelligence Machine Learning Open Source API Integration Data science

Language models like DBRX are crucial in AI, changing how we use technology from chatbots to code generation.
DBRX is an open-source alternative to closed models, providing high performance and accessibility to developers.
DBRX stands out for its top performance, versatility in specialized domains, efficiency in training, and integration capabilities.

Saturday morning open tabs

Scott's Substack • 78 implied HN points • 10 Feb 24

🕹 Technology AI Tech news Virtual reality Data science

The post discusses the experience of switching phone carriers and the challenges faced, emphasizing the impact of not having a phone for a few days.
The post touches on upcoming summer plans including workshops in Madrid, Scotland, and potential travel to Vietnam, highlighting the diversity of travel experiences planned.
The author explores the new Apple Vision Pro product, contemplating its potential usage for work, entertainment, and travel, showcasing a mix of curiosity and skepticism.

Celebrating An Anniversary: Three Years of Writing About Analytics - Issue 154

Data Analysis Journal • 452 implied HN points • 26 Jul 23

🕹 Technology Analytics Data science Newsletter Writing Career development

The author reflects on three years of writing a newsletter about analytics, thanking supporters and subscribers.
The author's newsletter aims to document their journey, bridge the gap between academics and industry, and encourage classic data analysis.
The author shares insights on their writing strategy, the power of being small and independent, and future plans for the newsletter.

2023 Kaggle AI Report

Bojan’s Newsletter • 196 implied HN points • 10 Oct 23

🕹 Technology Data science Machine Learning AI Research Competitions

Kaggle is a valuable platform for data science and ML career development
Kaggle solutions often offer innovative insights ahead of research and industry trends
Tabular data ML remains an important area in the field of machine learning

📌 Exciting news! The speaker lineup for apply() 2024 is now live

TheSequence • 21 implied HN points • 15 Mar 24

🕹 Technology AI ML Data science Events

The speaker lineup for apply() 2024 event is now live, featuring industry leaders from companies like LangChain, Meta, Visa, and more.
The event offers actionable insights to master AI and ML in production, with sessions on topics like LangChain Keynote, Semi-Supervised Learning, and Uplift Modeling.
Attendees can register for free to join the event live on April 3rd, with the option to receive on-demand videos as well.

Monosemanticity at Home: My Attempt at Replicating Anthropic's Interpretability Research from Scratch

Jake Ward's Blog • 2 HN points • 30 Apr 24

🕹 Technology AI Data science Machine Learning Research Interpretability

Large language models like ChatGPT have complex, learned logic that is difficult to interpret due to 'superposition' - where single neurons correspond to multiple functions.
Techniques like sparse dictionary learning can decompose artificial neurons into 'features' that exhibit 'monosemanticity', making the models more interpretable.
Reproducing research on model interpretability shows promise for breakthroughs and indicates a shift towards engineering challenges over scientific barriers.

How RLHF actually works

Democratizing Automation • 306 implied HN points • 21 Jun 23

🕹 Technology AI Machine Learning Data science Open Source Scaling

RLHF works when there is a signal that vanilla supervised learning alone doesn't work, like pairwise preference data.
Having a capable base model is crucial for successful RLHF implementation, as imitating models or using imperfect datasets can greatly affect performance.
Preferences play a key role in the RLHF process, and collecting preference data for harmful prompts is essential for model optimization.

What To Do When You're Stuck At A Business That Doesn't Care About Data Science

High ROI Data Science • 353 implied HN points • 27 Feb 23

💼 Business Data science Leadership Strategic Planning

Many data scientists in companies that don't prioritize data science end up doing basic reporting and analytics.
Technical management in such companies often lack the understanding and incentives to support data initiatives.
Navigating a lack of data culture and strategy in a company requires significant effort but can lead to valuable career opportunities.

What is the Story With Your Data? How To Make Sense of Your Stats

Data at Depth • 39 implied HN points • 31 Jan 24

🕹 Technology Data science Research Presentations

Data storytelling involves progressing from exploratory to explanatory research.
Brilliant data science researchers may not always be brilliant presenters.
It is important to make sense of data so that it can be effectively communicated to others.

Introduction To Analytics Engineering

Data Analysis Journal • 353 implied HN points • 22 Mar 23

🕹 Technology Data Analysis Data science SQL Data Engineering

Analytics engineers bridge the gap between data engineers and data analysts by focusing on producing high-quality data.
Analytics engineers use tools like dbt to streamline data modeling, testing, and documentation.
Data quality is crucial in decision-making, making analytics engineering more important than ever.

The Roundup of Blogs and Newsletters About Analytics - Issue 133

Data Analysis Journal • 314 implied HN points • 22 Feb 23

🕹 Technology Data Analysis Data science Product Analytics SaaS

The post discusses a roundup of blogs and newsletters about analytics.
It highlights key articles on adjacent users measurement, ML in product analytics, and SQL case statements.
Various expert blogs and newsletters are recommended for analysts, data practitioners, and anyone interested in data and analytics.

Slack's greatest magic trick

Top of the Lyne • 314 implied HN points • 29 Apr 23

💼 Business Startups Marketing Data science Engineering Revenue Models

Net Revenue Retention is a science, not art, and can be engineered
Successful subscription businesses have at least 20% of revenue driven by expansion, with some as high as 40%
Slack's segmentation engine is a complex but well-crafted marvel of data science and engineering

2023 Wrap up

RSS DS+AI Section • 53 implied HN points • 31 Dec 23

🕹 Technology Data science Artificial Intelligence Events Newsletter

The focus for the year was 'Effective and Efficient Data Science' to highlight the critical aspects of the field beyond hype.
Various events and discussions were held throughout the year to promote best practices in Data Science.
Engagement with the community through events, surveys, and articles was emphasized to ensure diverse voices are heard in influencing policy.

SAI #22: Decomposing the Data System.

SwirlAI Newsletter • 294 implied HN points • 18 Mar 23

🕹 Technology Data science Data Engineering MLOps Machine Learning Data Systems

Learning to decompose a data system is crucial for better reasoning and understanding of large infrastructure
Decomposing a data system allows for scalability, identification of bottlenecks, and total event processing latency optimization
The different layers in a data system include data ingestion, transformation, and serving layers, each with specific functions and technologies

Embracing the New Era of Accelerated Testing - Issue 150

Data Analysis Journal • 235 implied HN points • 28 Jun 23

🕹 Technology Data science Analytics A/B Testing Experimentation Tooling

Embracing accelerated testing in the modern data analysis landscape is essential for success.
The current traditional academic workflow for A/B testing may not be suitable for the evolving landscape of experimentation.
To thrive in the era of rapid feature flagging and A/B testing, teams need to adapt by automating statistical checks, simplifying documentation, and eliminating bias.

Top 10 Events 2024

Bytewax • 39 implied HN points • 18 Jan 24

🕹 Technology Data science Machine Learning Python AI

Top tech conferences in 2024 focus on AI, data science, ML, and Python.
Events offer opportunities to learn, connect with peers, and expand skills.
Attendees benefit from valuable insights, networking, and community engagement.

How To Pass A SQL Interview For A Data Scientist Position - Issue 140

Data Analysis Journal • 275 implied HN points • 19 Apr 23

🕹 Technology Data science SQL

Data science job interviews may test candidates on Python and SQL proficiency.
Technical coding interview questions for data science positions can include SQL challenges.
Being proficient in SQL and data analysis is essential for succeeding in a data scientist position.

Synthetic Data In A Nutshell

Three Data Point Thursday • 39 implied HN points • 11 Jan 24

🕹 Technology Data science AI Data Engineering Machine Learning Software Development

Synthetic data is fake data that is becoming increasingly practical and valuable.
Generative AI and the growing gap between data demand and availability are driving forces for the usefulness of synthetic data.
Synthetic data is beneficial in various fields beyond just machine learning, offering opportunities for innovation and improvement.

How Would I Break Into Data Science If I Had To Do It All Over Again?

High ROI Data Science • 216 implied HN points • 20 Jun 23

🕹 Technology Data science

The author wouldn't change anything about their career path in data science.
Lessons learned from mistakes were valuable for the author's growth.
The author believes there was no other way to reach their current level of expertise.

How to Go to BBQ Hell via the Use of Data

Think Future • 79 implied HN points • 02 Nov 23

🎭️ Culture Food & Drink Data science Critical Thinking Expertise Futurism

The importance of expertise in interpreting data findings - data can sometimes lead to nonsensical conclusions without proper expertise to guide the analysis.
Be cautious of drawing conclusions solely based on data - critical thinking is essential to avoid errors in analysis, like the case of Trip Advisor's BBQ city rankings.
Consulting with longtime experts is crucial before accepting data-driven findings as 'rock-solid' - having seasoned professionals review results can help prevent misinterpretations and errors.

AI in 2024 - Scandals, Security, and Sustainability!

aidaily • 39 implied HN points • 01 Jan 24

🕹 Technology AI Data science Medicine Music E-commerce

AI experts predict unpredictable future in 2024
The New York Times in legal battle over AI usage
AI revolutionizing medicine with new antibiotics