The hottest Data science Substack posts right now

And their main takeaways

June Newsletter

RSS DS+AI Section • 5 implied HN points • 01 Jun 25

🕹 Technology Data science

Ethics and bias in AI are big topics right now. Many people are talking about how to keep AI safe and fair as it becomes more advanced.
There are many exciting developments in AI research, including new tools and methods. For example, some AI can now create new algorithms and even assist in healthcare.
Real-world applications of AI are growing, with many helpful resources and tutorials available. It's becoming easier for people to use AI for practical tasks and projects.

Issue 17: No great stagnation in cruise ships

The Works in Progress Newsletter • 12 implied HN points • 05 Dec 24

🕹 Technology Data science

Cruise ships show that new ideas and growth are still possible in design and urban living, even as some land technologies seem to stall.
Madrid has successfully built its metro system much faster and cheaper than cities like London and New York by using smart planning and incentives for local leaders.
Many animals, like horses and crabs, are essential for creating life-saving chemicals, reminding us that we still rely on nature, even as technology advances.

40 under 40 data scientists, ConvNets & AI landscape in India 🎮🎚 🕣🎀

Sector 6 | The Newsletter of AIM • 39 implied HN points • 23 Jan 22

🕹 Technology Data science

The '40 under 40' list highlights outstanding data scientists in India. These are young professionals making significant impacts in the field.
Nominations are currently open for the '50 Best Firms In India For Data Scientists To Work For'. This is a chance for companies to showcase their work environment and culture.
The Machine Learning Developers Summit recently concluded successfully. It brought together many experts and resources in the machine learning community.

AI, ML & Data Science Courses Galore ✨

Sector 6 | The Newsletter of AIM • 19 implied HN points • 13 Nov 22

🕹 Technology Data science

More universities are now offering AI, ML, and data science courses. This makes it easier for people to learn these important skills.
These courses come in both full-time and part-time options, giving flexibility to students with different schedules.
The growth of these programs shows a rising demand for knowledge in AI and data science fields, indicating they are becoming crucial for many careers.

A Brief History of MLOps in Three Acts

Laszlo’s Newsletter • 54 implied HN points • 20 Feb 23

🕹 Technology Data science

The evolution of MLOps tools started from handling big data and SQL to deployment, feature stores, model monitoring, and more
The increasing complexity of ML models led to the development of tools like XGBoost, TensorFlow, PyTorch, and the need for distributed computing
Machine Learning Engineers play a crucial role in navigating the ever-changing landscape of MLOps tools and technologies

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Vesuvius Challenge is hiring!

Vesuvius Challenge • 9 implied HN points • 21 Jan 25

🕹 Technology Data science

The Vesuvius Challenge is looking for team members to help recover texts from ancient scrolls. They need people for two key roles: research in computer vision and platform engineering.
The computer vision role focuses on using advanced tech to read the scrolls, which involves solving complex problems with CT scan data.
The platform engineering role is about creating tools and systems to manage and share large datasets, making research easier for the community.

December Newsletter

RSS DS+AI Section • 11 implied HN points • 01 Dec 24

🕹 Technology Data science

There are ongoing discussions about the ethical use of AI, especially in healthcare and military. It’s important to think about privacy and the implications of these technologies.
New developments in data science and AI research are exciting, such as improved models for training and reasoning. It's a fast-paced field with many recent breakthroughs.
Generative AI is evolving quickly, with many companies working on new models and applications. This includes features like AI-generated summaries of content you're watching.

Uncertainty in machine learning deployments

Denis’s Substack • 7 HN points • 07 Jun 23

🕹 Technology Data science

Many machine learning projects never make it to production due to various reasons like lack of stakeholder buy-in and data quality issues.
The traditional linear process of analyzing, extracting data, modeling, deploying, and operating models can be naive and not reduce uncertainty.
Embracing uncertainty in machine learning deployments can involve starting the deployment phase before data extraction, leading to constant value addition throughout the process.

How does the Air Quality Index work anyways?

Counting Stuff • 43 implied HN points • 13 Jun 23

🔬 Science Data science

The Air Quality Index is a single score that combines 6 pollutants into a usable number for people to understand.
Time frames are important in the AQI, as it is based on daily air quality summaries and forecasts are encouraged for planning purposes.
The AQI simplifies complex air quality data by using a linear scaling system, with the max value among pollutants determining the overall index.

Data Science Weekly - Issue 472

Data Science Weekly Newsletter • 19 implied HN points • 08 Dec 22

🕹 Technology Data science

Machine learning can unintentionally develop biases from training data, which is important to detect and fix, especially in critical areas like healthcare and self-driving cars.
Google Sheets now offers a way to use machine learning without coding skills, making it accessible for everyone to perform simple data tasks like predicting values and identifying anomalies.
There is a trend in tech companies to make machine learning processes happen in real-time, which can lead to faster and more efficient data insights.

Data Science Weekly - Issue 471

Data Science Weekly Newsletter • 19 implied HN points • 01 Dec 22

🕹 Technology Data science

MLOps is important for automating and managing machine learning products. It helps researchers and practitioners understand the principles and challenges of operating ML systems.
Companies face trade-offs when transitioning to real-time machine learning pipelines. They must balance performance, cost, and infrastructure complexity to find the best solution.
The FDA and other agencies have created guiding principles for using machine learning in medical devices. These principles aim to ensure the safety and effectiveness of AI/ML in healthcare.

Ranking Biases, Multi-Turn Woes, and State-of-the-Art Zero-Shot Speech

ppdispatch • 5 implied HN points • 16 May 25

🕹 Technology Data science

The 'Leaderboard Illusion' highlights how some AI models get unfair rankings because of selective information sharing. This can make it hard to know which models are truly the best.
Large Language Models (LLMs) struggle a lot in long conversations, with a big drop in their performance. They often lose track of conversations and can make mistakes early on that affect the whole chat.
MiniMax-Speech is a new tech for turning text into speech that can imitate voices in multiple languages. It also allows for cool features like expressing emotions in the voice.

Data Science Weekly - Issue 470

Data Science Weekly Newsletter • 19 implied HN points • 24 Nov 22

🕹 Technology Data science

Using recommender systems can lead to problems like clickbait and addiction if they're only focused on engagement. We need to think differently to create better systems that really serve people's needs.
GitLab has a detailed Data Team Handbook that explains how their data team works, what data is available, and how it helps different departments make decisions. This can guide other teams looking to improve their data processes.
Deep learning techniques are being researched to playtest video games like Candy Crush. This shows how AI can create more human-like testing methods and improve the gaming experience.

Data Science Weekly - Issue 469

Data Science Weekly Newsletter • 19 implied HN points • 17 Nov 22

🕹 Technology Data science

Learning machine learning can be accomplished without an engineering background. It often requires hard work, perseverance, and adopting good software engineering practices.
Robotics and AI are being increasingly used in fulfillment processes at companies like Amazon. These technologies face challenges but also provide innovative solutions for package handling.
Large language models are evolving to act like agents that make decisions. This shift towards action-driven models may make them resemble artificial general intelligence (AGI) more closely.

New tools to use with new scroll

Vesuvius Challenge • 10 implied HN points • 27 Nov 24

🕹 Technology Data science

The Vesuvius Challenge has introduced new tools to help with studying ancient scrolls. These tools are meant to improve our understanding of scrolls found in Herculaneum.
There is a total of $18,500 available as prizes for community contributions. The rewards are aimed at motivating open-source work that supports the reading and analysis of the new scroll dataset.
Several contributors have developed techniques and tools for better image segmentation and data analysis of scrolls. These advancements help make the process of interpreting ancient texts easier and more accurate.

Data Science Weekly - Issue 468

Data Science Weekly Newsletter • 19 implied HN points • 10 Nov 22

🕹 Technology Data science

If you're thinking about leaving Twitter, it's a good idea to save your data first. You can use it to find trends and insights that might be really useful later.
Learning command-line data analytics can make your data processing much easier. There's a new tool called SPyQL that makes it simpler to work with and understand data on the command line.
Federated learning allows us to train models using data from many users without needing to see the actual data. This means we can protect privacy while still making progress in AI.

Data Science Weekly - Issue 467

Data Science Weekly Newsletter • 19 implied HN points • 03 Nov 22

🕹 Technology Data science

User experience (UX) is really important for startups using large language models. Many struggle because they focus on the wrong things instead of improving UX and product design.
Data science notebooks have evolved a lot since they were first introduced. They are now essential tools in data science, and there’s an exciting future ahead for their development.
OpenAI is financially supporting AI startups with a significant investment. They're offering early access to their systems to help these startups grow.

Data Science Weekly - Issue 466

Data Science Weekly Newsletter • 19 implied HN points • 27 Oct 22

🕹 Technology Data science

Science education should focus on teaching scientific virtues first, rather than just tools and techniques. This approach helps students understand the core values of scientific inquiry.
A data dictionary is essential for ensuring quality data collection and interpretation. It's best created before data collection to guide your research design.
The Farama Foundation is aimed at improving open-source reinforcement learning by maintaining and standardizing existing libraries. This will help in developing more effective RL tools for the community.

To Use Or Not To Use [HTC #69]

Hold the code • 4 implied HN points • 30 May 25

🕹 Technology Data science

Tech buzzwords are often just fancy terms that can make simple ideas sound more complex. It's easy to use these words to impress people but they can confuse others.
AI is increasingly being used as a therapist because it's accessible and can provide immediate support, but it should not replace real human therapists, who understand emotions better.
The term 'artificial intelligence' is becoming vague and companies often use it to make their products sound smarter, even if they aren't truly intelligent. This can mislead the public about what AI can really do.

October recap

The Palindrome • 1 implied HN point • 09 Nov 25

🕹 Technology Data science

In October, several new articles were published on machine learning topics, including how to measure information and understanding computational graphs. These resources are helpful for anyone looking to learn about these subjects.
The Palindrome hosted live events, including 'Office Hours' and interviews with experts. These sessions offered a chance for members to engage and learn more directly from knowledgeable guests.
The community is growing with over 540 machine learning practitioners joining the membership, making it a great place for networking and learning together.

The true information is in the words, not the numbers

Klement on Investing • 4 implied HN points • 29 May 25

💰 Finance Data science

Analyst recommendations are often seen as unreliable, especially when a 'Hold' is viewed like a 'Sell'. People are starting to see more value in the actual words analysts use rather than just the numbers they give.
AI has been used to analyze over a million analyst reports, revealing that most discussions focus on profitability. However, during tough times, there's less talk about profitability and more on financial stability.
It turns out that the specific language analysts use can help predict changes in earnings and stock prices, showing that understanding their words might be more valuable than just following their price forecasts.

GPT-4, Bias-Variance Tradeoff & Gupshup 💽🛰🔨

Sector 6 | The Newsletter of AIM • 39 implied HN points • 19 Sep 21

🕹 Technology Data science

Rankings of data science courses in India help students choose the right programs. They get a broad overview of what's available in the education landscape.
The rankings come from careful surveys and research, ensuring the information is reliable. More than 150 courses get nominated every year to keep the list current.
Gupshup is a topic that combines interesting discussions about analytics and technology. It’s a great way to explore the latest trends in data science.

Data Science Weekly - Issue 465

Data Science Weekly Newsletter • 19 implied HN points • 20 Oct 22

🕹 Technology Data science

AI writing assistants are helping indie authors write faster and come up with story ideas. Tools like Lex are changing how creatives approach their writing.
Recent research shows that parts of the brain, like the hippocampus, work similarly to AI models known as transformers. This discovery helps us understand both artificial intelligence and human memory.
The State of AI Report 2022 reviews important trends in AI, including technology breakthroughs, commercial applications, and safety concerns. It provides valuable insights for both researchers and industry leaders.

📌 Exciting news! The speaker lineup for apply() 2024 is now live

TheSequence • 21 implied HN points • 15 Mar 24

🕹 Technology Data science

The speaker lineup for apply() 2024 event is now live, featuring industry leaders from companies like LangChain, Meta, Visa, and more.
The event offers actionable insights to master AI and ML in production, with sessions on topics like LangChain Keynote, Semi-Supervised Learning, and Uplift Modeling.
Attendees can register for free to join the event live on April 3rd, with the option to receive on-demand videos as well.

Data Science Weekly - Issue 464

Data Science Weekly Newsletter • 19 implied HN points • 13 Oct 22

🕹 Technology Data science

Building a community around R in the pharmaceutical industry can help users connect and share knowledge more effectively. It's important to identify who the users are and create a space for collaboration.
Creating research ideas can start with understanding gaps in existing literature. By reading a single paper, you can learn frameworks to generate new ideas and improve your research quality.
Data cleaning for machine learning models is crucial, starting from the ETL pipeline. It’s important to commit to high-quality data from the beginning to avoid common pitfalls that impact model accuracy.

Data Science Weekly - Issue 463

Data Science Weekly Newsletter • 19 implied HN points • 06 Oct 22

🕹 Technology Data science

When you get a big CSV file, it's important to choose the right tools to explore and understand the data quickly.
Using AI, like GPT-3, can help turn messy text into organized data, saving a lot of manual work.
There's growing interest in using collective intelligence ideas to improve deep learning and AI research.

Data Science Weekly - Issue 462

Data Science Weekly Newsletter • 19 implied HN points • 29 Sep 22

🕹 Technology Data science

Teaching students about scientific failure helps them build resilience. It prepares them for real-world challenges in research.
Understanding uncertainty in deep learning models is crucial for effective use. It helps in making better predictions and decisions.
Increasing data maturity in organizations leads to more strategic use of data. Assessing data maturity can guide teams in improving their data practices.

Gradient Flow #38: Large Language Models, Infinite Laptop, Overhyping AI

Gradient Flow • 39 implied HN points • 01 Jul 21

🕹 Technology Data science

Training large language models involves a new role referred to as 'prompt engineer'.
TabNet, a deep neural network for tabular data, outperforms other models in classification and regression problems.
Tools like AugLy for data augmentation and Flat Data for data acquisition simplify tasks and enhance model robustness.

I’ve never truly understood the Softmax function

just learning data science • 3 HN points • 23 Jan 24

🕹 Technology Data science

The Softmax function involves two simple steps: converting input values into positive ones using the exponential function and then normalizing them to fit in the range [0, 1] and add up to 1.
Understanding the Softmax function becomes clearer when broken down into these two operations.
By following the process of converting and normalizing values, the Softmax function can be easier to grasp.

Data Science Weekly - Issue 461

Data Science Weekly Newsletter • 19 implied HN points • 22 Sep 22

🕹 Technology Data science

Working in Natural Language Processing (NLP) involves keeping up with evolving models and figuring out how to effectively use data. It's still challenging for many to find practical applications for NLP.
Generative AI has the potential to make workers significantly more efficient and creative. This could result in substantial economic value across various industries.
Building trust in machine learning is crucial but challenging. It's important to address concerns about model reliability to maximize its business value.

Data Science Weekly - Issue 460

Data Science Weekly Newsletter • 19 implied HN points • 15 Sep 22

🕹 Technology Data science

Soft skills are super important for data scientists. Being able to communicate well and work in a team can make a big difference in their effectiveness.
There are great resources available online for learning data science, including live streams on platforms like Twitch. It’s a fun way to learn and engage with others.
Use the right fonts and designs in data visualizations. They can greatly affect how your data is understood and appreciated.

More nuggets on BYD ADAS

TP’s Substack • 6 implied HN points • 24 Feb 25

🕹 Technology Data science

BYD chose a specific chip setup for its DiPilot-100 platform that supports advanced technology better than other options. They prioritized overall performance and future needs rather than just the highest computing power.
The company collects a large amount of driving data daily, which helps constantly improve its ADAS technology. While it's still behind Tesla’s FSD, BYD's hardware is getting better and offers a good range for detection.
BYD is focusing on reducing costs by developing its own chips and increasing production efficiency. This strategy will help them expand smart car technology to more vehicles and compete effectively in the market.

Data Science Weekly - Issue 459

Data Science Weekly Newsletter • 19 implied HN points • 08 Sep 22

🕹 Technology Data science

Organizations need to invest in creating better data to gain an advantage over competitors. Good data can drive value and improve decision-making.
The activation layer of the modern data stack helps you use data in a more impactful way. This allows for personalized experiences rather than just viewing dashboards.
Using standard formats like ONNX for model exporting makes your machine learning models more portable across different programming environments, reducing dependencies on specific languages.

A day(?) in my life as a Quant UX Researcher

Counting Stuff • 32 implied HN points • 20 Jun 23

🕹 Technology Data science

In a fast-paced environment like tech, work dictates the tools used.
Meetings take up a significant portion of the work month for coordination and communication.
Balancing documentation work with actual data work is a key part of the job.

Data Science Weekly - Issue 458

Data Science Weekly Newsletter • 19 implied HN points • 01 Sep 22

🕹 Technology Data science

Machine learning best practices are shared in a guide from Google, helping those with some knowledge to improve their skills.
There's skepticism about deep learning promises, as experts continue to predict big changes that haven't happened yet.
AI is being used creatively, like generating art from Bible stories, which showcases the potential of technology in different fields.

Latent Reasoning, 3D Colorization, and the Limits of RL

HackerPulse Dispatch • 8 implied HN points • 13 Dec 24

🕹 Technology Data science

COCONUT is a new method that lets language models think in flexible ways, making it better at solving complex problems. It does this by using continuous latent spaces instead of just words.
ChromaDistill offers a smart way to add color to 3D images efficiently. It lets you view these scenes consistently from different angles without slowing things down.
Recent research shows that top AI models can be deceptive and plan strategically, which raises important safety concerns. There’s also a new approach to testing AI limits in a friendly, curiosity-driven way.

Week 78 - E6 - 🏰 Building a Moat for Your AI Startup 🏰

The Product Channel By Sid Saladi • 20 implied HN points • 11 Feb 24

🕹 Technology Data science

Building a competitive moat in AI involves strategic navigation of the generative AI value chain to create unique advantages.
For AI startups, it's crucial to focus on acquiring proprietary data, integrating AI into comprehensive workflows, and specializing models through incremental training techniques.
Companies like Anthropic, Landing AI, and Stability AI showcase effective moat-building strategies in AI by emphasizing ethical development, democratizing technology, and niche specialization.

Data Science Weekly - Issue 457

Data Science Weekly Newsletter • 19 implied HN points • 25 Aug 22

🕹 Technology Data science

AI systems struggle with language limitations and won't fully replicate human thinking. This shows that our understanding of thought and language needs to evolve.
Observable launched Free Teams to encourage more open collaboration in data science. It allows users to easily work together on projects and share insights for free.
There is a problem in the data industry where roles are too narrowly defined, leading to a lack of collaboration. This makes it hard for teams to communicate and understand each other's work.

Time to BLOOM 🌸

Sector 6 | The Newsletter of AIM • 19 implied HN points • 04 Jul 22

🕹 Technology Data science

BLOOM is a new open-source language model with 176 billion parameters. It's considered impressive because it was developed outside of the big tech companies.
This model is similar in structure to GPT-3, but its open-access nature means anyone can use it.
BLOOM represents a shift towards more collaborative and open approaches in AI research and development, encouraging more shared knowledge.

I am terrible at talking about my achievements

Counting Stuff • 32 implied HN points • 06 Jun 23

🕹 Technology Data science

Talking about your achievements is important for recognition and career advancement.
It's common to downplay your own work and focus on flaws, but it's crucial to highlight the positive impact.
Emphasize concrete facts and context when discussing your achievements, and seek feedback to improve your communication.