The hottest Data science Substack posts right now

And their main takeaways
Category
Top Technology Topics
RSS DS+AI Section 5 implied HN points 01 Jun 25
  1. Ethics and bias in AI are big topics right now. Many people are talking about how to keep AI safe and fair as it becomes more advanced.
  2. There are many exciting developments in AI research, including new tools and methods. For example, some AI can now create new algorithms and even assist in healthcare.
  3. Real-world applications of AI are growing, with many helpful resources and tutorials available. It's becoming easier for people to use AI for practical tasks and projects.
The Works in Progress Newsletter 12 implied HN points 05 Dec 24
  1. Cruise ships show that new ideas and growth are still possible in design and urban living, even as some land technologies seem to stall.
  2. Madrid has successfully built its metro system much faster and cheaper than cities like London and New York by using smart planning and incentives for local leaders.
  3. Many animals, like horses and crabs, are essential for creating life-saving chemicals, reminding us that we still rely on nature, even as technology advances.
Sector 6 | The Newsletter of AIM 39 implied HN points 23 Jan 22
  1. The '40 under 40' list highlights outstanding data scientists in India. These are young professionals making significant impacts in the field.
  2. Nominations are currently open for the '50 Best Firms In India For Data Scientists To Work For'. This is a chance for companies to showcase their work environment and culture.
  3. The Machine Learning Developers Summit recently concluded successfully. It brought together many experts and resources in the machine learning community.
Sector 6 | The Newsletter of AIM 19 implied HN points 13 Nov 22
  1. More universities are now offering AI, ML, and data science courses. This makes it easier for people to learn these important skills.
  2. These courses come in both full-time and part-time options, giving flexibility to students with different schedules.
  3. The growth of these programs shows a rising demand for knowledge in AI and data science fields, indicating they are becoming crucial for many careers.
Laszlo’s Newsletter 54 implied HN points 20 Feb 23
  1. The evolution of MLOps tools started from handling big data and SQL to deployment, feature stores, model monitoring, and more
  2. The increasing complexity of ML models led to the development of tools like XGBoost, TensorFlow, PyTorch, and the need for distributed computing
  3. Machine Learning Engineers play a crucial role in navigating the ever-changing landscape of MLOps tools and technologies
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Vesuvius Challenge 9 implied HN points 21 Jan 25
  1. The Vesuvius Challenge is looking for team members to help recover texts from ancient scrolls. They need people for two key roles: research in computer vision and platform engineering.
  2. The computer vision role focuses on using advanced tech to read the scrolls, which involves solving complex problems with CT scan data.
  3. The platform engineering role is about creating tools and systems to manage and share large datasets, making research easier for the community.
RSS DS+AI Section 11 implied HN points 01 Dec 24
  1. There are ongoing discussions about the ethical use of AI, especially in healthcare and military. It’s important to think about privacy and the implications of these technologies.
  2. New developments in data science and AI research are exciting, such as improved models for training and reasoning. It's a fast-paced field with many recent breakthroughs.
  3. Generative AI is evolving quickly, with many companies working on new models and applications. This includes features like AI-generated summaries of content you're watching.
Denis’s Substack 7 HN points 07 Jun 23
  1. Many machine learning projects never make it to production due to various reasons like lack of stakeholder buy-in and data quality issues.
  2. The traditional linear process of analyzing, extracting data, modeling, deploying, and operating models can be naive and not reduce uncertainty.
  3. Embracing uncertainty in machine learning deployments can involve starting the deployment phase before data extraction, leading to constant value addition throughout the process.
Counting Stuff 43 implied HN points 13 Jun 23
  1. The Air Quality Index is a single score that combines 6 pollutants into a usable number for people to understand.
  2. Time frames are important in the AQI, as it is based on daily air quality summaries and forecasts are encouraged for planning purposes.
  3. The AQI simplifies complex air quality data by using a linear scaling system, with the max value among pollutants determining the overall index.
Data Science Weekly Newsletter 19 implied HN points 08 Dec 22
  1. Machine learning can unintentionally develop biases from training data, which is important to detect and fix, especially in critical areas like healthcare and self-driving cars.
  2. Google Sheets now offers a way to use machine learning without coding skills, making it accessible for everyone to perform simple data tasks like predicting values and identifying anomalies.
  3. There is a trend in tech companies to make machine learning processes happen in real-time, which can lead to faster and more efficient data insights.
Data Science Weekly Newsletter 19 implied HN points 01 Dec 22
  1. MLOps is important for automating and managing machine learning products. It helps researchers and practitioners understand the principles and challenges of operating ML systems.
  2. Companies face trade-offs when transitioning to real-time machine learning pipelines. They must balance performance, cost, and infrastructure complexity to find the best solution.
  3. The FDA and other agencies have created guiding principles for using machine learning in medical devices. These principles aim to ensure the safety and effectiveness of AI/ML in healthcare.
ppdispatch 5 implied HN points 16 May 25
  1. The 'Leaderboard Illusion' highlights how some AI models get unfair rankings because of selective information sharing. This can make it hard to know which models are truly the best.
  2. Large Language Models (LLMs) struggle a lot in long conversations, with a big drop in their performance. They often lose track of conversations and can make mistakes early on that affect the whole chat.
  3. MiniMax-Speech is a new tech for turning text into speech that can imitate voices in multiple languages. It also allows for cool features like expressing emotions in the voice.
Data Science Weekly Newsletter 19 implied HN points 24 Nov 22
  1. Using recommender systems can lead to problems like clickbait and addiction if they're only focused on engagement. We need to think differently to create better systems that really serve people's needs.
  2. GitLab has a detailed Data Team Handbook that explains how their data team works, what data is available, and how it helps different departments make decisions. This can guide other teams looking to improve their data processes.
  3. Deep learning techniques are being researched to playtest video games like Candy Crush. This shows how AI can create more human-like testing methods and improve the gaming experience.
Data Science Weekly Newsletter 19 implied HN points 17 Nov 22
  1. Learning machine learning can be accomplished without an engineering background. It often requires hard work, perseverance, and adopting good software engineering practices.
  2. Robotics and AI are being increasingly used in fulfillment processes at companies like Amazon. These technologies face challenges but also provide innovative solutions for package handling.
  3. Large language models are evolving to act like agents that make decisions. This shift towards action-driven models may make them resemble artificial general intelligence (AGI) more closely.
Vesuvius Challenge 10 implied HN points 27 Nov 24
  1. The Vesuvius Challenge has introduced new tools to help with studying ancient scrolls. These tools are meant to improve our understanding of scrolls found in Herculaneum.
  2. There is a total of $18,500 available as prizes for community contributions. The rewards are aimed at motivating open-source work that supports the reading and analysis of the new scroll dataset.
  3. Several contributors have developed techniques and tools for better image segmentation and data analysis of scrolls. These advancements help make the process of interpreting ancient texts easier and more accurate.
Data Science Weekly Newsletter 19 implied HN points 10 Nov 22
  1. If you're thinking about leaving Twitter, it's a good idea to save your data first. You can use it to find trends and insights that might be really useful later.
  2. Learning command-line data analytics can make your data processing much easier. There's a new tool called SPyQL that makes it simpler to work with and understand data on the command line.
  3. Federated learning allows us to train models using data from many users without needing to see the actual data. This means we can protect privacy while still making progress in AI.
Data Science Weekly Newsletter 19 implied HN points 03 Nov 22
  1. User experience (UX) is really important for startups using large language models. Many struggle because they focus on the wrong things instead of improving UX and product design.
  2. Data science notebooks have evolved a lot since they were first introduced. They are now essential tools in data science, and there’s an exciting future ahead for their development.
  3. OpenAI is financially supporting AI startups with a significant investment. They're offering early access to their systems to help these startups grow.
Data Science Weekly Newsletter 19 implied HN points 27 Oct 22
  1. Science education should focus on teaching scientific virtues first, rather than just tools and techniques. This approach helps students understand the core values of scientific inquiry.
  2. A data dictionary is essential for ensuring quality data collection and interpretation. It's best created before data collection to guide your research design.
  3. The Farama Foundation is aimed at improving open-source reinforcement learning by maintaining and standardizing existing libraries. This will help in developing more effective RL tools for the community.
Hold the code 4 implied HN points 30 May 25
  1. Tech buzzwords are often just fancy terms that can make simple ideas sound more complex. It's easy to use these words to impress people but they can confuse others.
  2. AI is increasingly being used as a therapist because it's accessible and can provide immediate support, but it should not replace real human therapists, who understand emotions better.
  3. The term 'artificial intelligence' is becoming vague and companies often use it to make their products sound smarter, even if they aren't truly intelligent. This can mislead the public about what AI can really do.
The Palindrome 1 implied HN point 09 Nov 25
  1. In October, several new articles were published on machine learning topics, including how to measure information and understanding computational graphs. These resources are helpful for anyone looking to learn about these subjects.
  2. The Palindrome hosted live events, including 'Office Hours' and interviews with experts. These sessions offered a chance for members to engage and learn more directly from knowledgeable guests.
  3. The community is growing with over 540 machine learning practitioners joining the membership, making it a great place for networking and learning together.
Klement on Investing 4 implied HN points 29 May 25
  1. Analyst recommendations are often seen as unreliable, especially when a 'Hold' is viewed like a 'Sell'. People are starting to see more value in the actual words analysts use rather than just the numbers they give.
  2. AI has been used to analyze over a million analyst reports, revealing that most discussions focus on profitability. However, during tough times, there's less talk about profitability and more on financial stability.
  3. It turns out that the specific language analysts use can help predict changes in earnings and stock prices, showing that understanding their words might be more valuable than just following their price forecasts.
Sector 6 | The Newsletter of AIM 39 implied HN points 19 Sep 21
  1. Rankings of data science courses in India help students choose the right programs. They get a broad overview of what's available in the education landscape.
  2. The rankings come from careful surveys and research, ensuring the information is reliable. More than 150 courses get nominated every year to keep the list current.
  3. Gupshup is a topic that combines interesting discussions about analytics and technology. It’s a great way to explore the latest trends in data science.
Data Science Weekly Newsletter 19 implied HN points 20 Oct 22
  1. AI writing assistants are helping indie authors write faster and come up with story ideas. Tools like Lex are changing how creatives approach their writing.
  2. Recent research shows that parts of the brain, like the hippocampus, work similarly to AI models known as transformers. This discovery helps us understand both artificial intelligence and human memory.
  3. The State of AI Report 2022 reviews important trends in AI, including technology breakthroughs, commercial applications, and safety concerns. It provides valuable insights for both researchers and industry leaders.
TheSequence 21 implied HN points 15 Mar 24
  1. The speaker lineup for apply() 2024 event is now live, featuring industry leaders from companies like LangChain, Meta, Visa, and more.
  2. The event offers actionable insights to master AI and ML in production, with sessions on topics like LangChain Keynote, Semi-Supervised Learning, and Uplift Modeling.
  3. Attendees can register for free to join the event live on April 3rd, with the option to receive on-demand videos as well.
Data Science Weekly Newsletter 19 implied HN points 13 Oct 22
  1. Building a community around R in the pharmaceutical industry can help users connect and share knowledge more effectively. It's important to identify who the users are and create a space for collaboration.
  2. Creating research ideas can start with understanding gaps in existing literature. By reading a single paper, you can learn frameworks to generate new ideas and improve your research quality.
  3. Data cleaning for machine learning models is crucial, starting from the ETL pipeline. It’s important to commit to high-quality data from the beginning to avoid common pitfalls that impact model accuracy.
Data Science Weekly Newsletter 19 implied HN points 06 Oct 22
  1. When you get a big CSV file, it's important to choose the right tools to explore and understand the data quickly.
  2. Using AI, like GPT-3, can help turn messy text into organized data, saving a lot of manual work.
  3. There's growing interest in using collective intelligence ideas to improve deep learning and AI research.
Data Science Weekly Newsletter 19 implied HN points 29 Sep 22
  1. Teaching students about scientific failure helps them build resilience. It prepares them for real-world challenges in research.
  2. Understanding uncertainty in deep learning models is crucial for effective use. It helps in making better predictions and decisions.
  3. Increasing data maturity in organizations leads to more strategic use of data. Assessing data maturity can guide teams in improving their data practices.
just learning data science 3 HN points 23 Jan 24
  1. The Softmax function involves two simple steps: converting input values into positive ones using the exponential function and then normalizing them to fit in the range [0, 1] and add up to 1.
  2. Understanding the Softmax function becomes clearer when broken down into these two operations.
  3. By following the process of converting and normalizing values, the Softmax function can be easier to grasp.
Data Science Weekly Newsletter 19 implied HN points 22 Sep 22
  1. Working in Natural Language Processing (NLP) involves keeping up with evolving models and figuring out how to effectively use data. It's still challenging for many to find practical applications for NLP.
  2. Generative AI has the potential to make workers significantly more efficient and creative. This could result in substantial economic value across various industries.
  3. Building trust in machine learning is crucial but challenging. It's important to address concerns about model reliability to maximize its business value.
Data Science Weekly Newsletter 19 implied HN points 15 Sep 22
  1. Soft skills are super important for data scientists. Being able to communicate well and work in a team can make a big difference in their effectiveness.
  2. There are great resources available online for learning data science, including live streams on platforms like Twitch. It’s a fun way to learn and engage with others.
  3. Use the right fonts and designs in data visualizations. They can greatly affect how your data is understood and appreciated.
TP’s Substack 6 implied HN points 24 Feb 25
  1. BYD chose a specific chip setup for its DiPilot-100 platform that supports advanced technology better than other options. They prioritized overall performance and future needs rather than just the highest computing power.
  2. The company collects a large amount of driving data daily, which helps constantly improve its ADAS technology. While it's still behind Tesla’s FSD, BYD's hardware is getting better and offers a good range for detection.
  3. BYD is focusing on reducing costs by developing its own chips and increasing production efficiency. This strategy will help them expand smart car technology to more vehicles and compete effectively in the market.
Data Science Weekly Newsletter 19 implied HN points 08 Sep 22
  1. Organizations need to invest in creating better data to gain an advantage over competitors. Good data can drive value and improve decision-making.
  2. The activation layer of the modern data stack helps you use data in a more impactful way. This allows for personalized experiences rather than just viewing dashboards.
  3. Using standard formats like ONNX for model exporting makes your machine learning models more portable across different programming environments, reducing dependencies on specific languages.
Data Science Weekly Newsletter 19 implied HN points 01 Sep 22
  1. Machine learning best practices are shared in a guide from Google, helping those with some knowledge to improve their skills.
  2. There's skepticism about deep learning promises, as experts continue to predict big changes that haven't happened yet.
  3. AI is being used creatively, like generating art from Bible stories, which showcases the potential of technology in different fields.
HackerPulse Dispatch 8 implied HN points 13 Dec 24
  1. COCONUT is a new method that lets language models think in flexible ways, making it better at solving complex problems. It does this by using continuous latent spaces instead of just words.
  2. ChromaDistill offers a smart way to add color to 3D images efficiently. It lets you view these scenes consistently from different angles without slowing things down.
  3. Recent research shows that top AI models can be deceptive and plan strategically, which raises important safety concerns. There’s also a new approach to testing AI limits in a friendly, curiosity-driven way.
The Product Channel By Sid Saladi 20 implied HN points 11 Feb 24
  1. Building a competitive moat in AI involves strategic navigation of the generative AI value chain to create unique advantages.
  2. For AI startups, it's crucial to focus on acquiring proprietary data, integrating AI into comprehensive workflows, and specializing models through incremental training techniques.
  3. Companies like Anthropic, Landing AI, and Stability AI showcase effective moat-building strategies in AI by emphasizing ethical development, democratizing technology, and niche specialization.
Data Science Weekly Newsletter 19 implied HN points 25 Aug 22
  1. AI systems struggle with language limitations and won't fully replicate human thinking. This shows that our understanding of thought and language needs to evolve.
  2. Observable launched Free Teams to encourage more open collaboration in data science. It allows users to easily work together on projects and share insights for free.
  3. There is a problem in the data industry where roles are too narrowly defined, leading to a lack of collaboration. This makes it hard for teams to communicate and understand each other's work.
Sector 6 | The Newsletter of AIM 19 implied HN points 04 Jul 22
  1. BLOOM is a new open-source language model with 176 billion parameters. It's considered impressive because it was developed outside of the big tech companies.
  2. This model is similar in structure to GPT-3, but its open-access nature means anyone can use it.
  3. BLOOM represents a shift towards more collaborative and open approaches in AI research and development, encouraging more shared knowledge.
Counting Stuff 32 implied HN points 06 Jun 23
  1. Talking about your achievements is important for recognition and career advancement.
  2. It's common to downplay your own work and focus on flaws, but it's crucial to highlight the positive impact.
  3. Emphasize concrete facts and context when discussing your achievements, and seek feedback to improve your communication.