The hottest Data science Substack posts right now

And their main takeaways

Reports of our death are an exaggeration Part 2

The Jolly Contrarian • 0 implied HN points • 24 Nov 23

🕹 Technology Data science

Machines are best utilized for tasks where human capabilities fall short, not to replace human intelligence entirely.
Creating a division of labor between human intelligence and machines can optimize productivity by focusing each on their strengths.
Artificial intelligence should not be used to simplify or homogenize cultural diversity, but rather to enhance human creativity and uniqueness.

Gradient Flow #20: Ethical Algorithms, Knowledge Graphs, Secure Communication

Gradient Flow • 0 implied HN points • 22 Oct 20

🕹 Technology Data science

Knowledge graphs are crucial in modern AI applications and tools are available for developers to start using them.
End-to-end machine learning platforms are essential for accelerating ML adoption and ensuring its sustainability.
Responsible AI practices are necessary to address gender and racial bias in applications like sentiment analysis and machine translation.

Gradient Flow #18: Forecasting & Groupthink, Interpreting NLP, Ray Ecosystem

Gradient Flow • 0 implied HN points • 24 Sep 20

🕹 Technology Data science

Using machine learning in medical triage and monitoring systems can greatly enhance healthcare operations and responses.
Reinforcement Learning in simulation software can enable companies to address more complex real-world scenarios.
The NLP industry survey report provides valuable insights for those using natural language technologies.

Gradient Flow #15: Technology Adoption, Bias in Speech, Fizz Buzz

Gradient Flow • 0 implied HN points • 13 Aug 20

🕹 Technology Data science

Data is power, and access to data can determine who holds power in society.
Machine learning technologies are still in early stages of adoption in the U.S.
Racial disparities in automatic speech recognition models highlight bias in machine learning applications.

A machine learning model with no input variables

just learning data science • 0 implied HN points • 29 Jan 24

🔬 Science Data science

Wikipedia may not be the best place for beginners to learn Data Science and Machine Learning due to the unordered topics and high entry level.
The concept of Likelihood function on Wikipedia made it difficult initially due to the absence of input variables, which is a crucial aspect to understand.
Models in machine learning can vary from deterministic with input variables to non-deterministic like a coin flip, showing the wide range of possibilities for machine learning models.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

3 easy ways to stay up-to-date in Data Science

just learning data science • 0 implied HN points • 23 Jan 24

🕹 Technology Data science

Follow Twitter accounts related to Data Science for insights from experts easily.
Subscribe to release notes of favorite libraries for quick updates on the latest developments in the field.
Utilize Medium.com subscription to quickly gain basic knowledge and insights about Data Science.

Hello

just learning data science • 0 implied HN points • 23 Jan 24

🕹 Technology Data science

Maciej shares his journey from web app development to Data Science and his experiences with projects and job positions
He aims to create a community of Data Scientists who can relate to his experiences and provide constructive criticism
Despite time constraints, Maciej pursues knowledge through writing articles and sharing insights with the public

Coming soon

just learning data science • 0 implied HN points • 23 Jan 24

🕹 Technology Data science

A new post about learning data science is coming soon on justlearningdatascience.substack.com.
The post is by Maciej Gruszczyński and will be available on January 23, 2024.
Readers are encouraged to subscribe to stay updated on the upcoming content.

Newsletter #20: PDFTraige

Decoding Coding • 0 implied HN points • 08 Nov 23

🕹 Technology Data science

PDFTriage helps AI understand the structure of documents, like research papers. By using this structure, it can give better answers to specific questions about the document.
It has three stages: first, it creates a detailed structure of the document; next, it queries data based on this structure; and finally, it answers user questions using the gathered information.
This approach shows how thinking about how humans write and organize information can improve how AI systems work. It allows the AI to pull relevant details effectively.

Newletter #17: Textbooks are all you need!

Decoding Coding • 0 implied HN points • 29 Jun 23

🕹 Technology Data science

Using online code for training LLMs can cause problems because that code often needs extra info to be useful and includes repetition. It's not always high-quality or useful code.
The phi-1 model improves training by using a specific set of high-quality code from textbooks and exercises, making it better for learning how to code.
This approach shows that just changing the training data can lead to better results, highlighting the importance of using good resources for teaching coding.

Newsletter #16: PEARL — A LLM brain for large texts

Decoding Coding • 0 implied HN points • 22 Jun 23

🕹 Technology Data science

LLMs can act like a 'brain' for processing and understanding large texts. They help plan and execute tasks by breaking them down into smaller steps.
The process consists of three main parts: discovering the necessary actions, creating a plan using those actions, and finally executing the plan carefully to avoid mistakes.
Though this method shows promise, it still has limitations, like generating incorrect plans and being restricted by the size of information it can handle. Improvements are expected as technology advances.

Newsletter #14: Adding Memory to LLMs

Decoding Coding • 0 implied HN points • 01 Jun 23

🕹 Technology Data science

LLMs can forget information when they get too big, which makes their performance worse. Adding an internal memory can help them remember better and adapt to new tasks.
The new framework, Decision Transformers with Memory (DT-Mem), uses a special memory module to identify and store important information effectively. This helps the model improve its decision-making.
By using techniques like content-based addressing, DT-Mem can selectively add or erase information in its memory, making it smarter and more efficient in handling tasks.

Newsletter #11: System Design for Machine Learning - Part I

Decoding Coding • 0 implied HN points • 04 May 23

🕹 Technology Data science

Before starting on a machine learning project, it's important to define clear goals and understand how ML can help achieve them.
Setting up a data pipeline is crucial; it involves collecting, preparing, and analyzing data to see what features are useful for your model.
When deploying machine learning models, you need to consider both hardware and software needs, including how to handle real-time data for ongoing training.

Newsletter #5: Backprop from scratch

Decoding Coding • 0 implied HN points • 09 Mar 23

🕹 Technology Data science

Derivatives show how small changes in inputs affect the output of a function. This is important for understanding how neural networks adjust to improve their predictions.
In neural networks, understanding how changes in weights and inputs influence the output helps us optimize performance. By adjusting weights based on calculated gradients, we can make the network learn better.
The chain rule is key when calculating how different layers of a neural network affect the final output. It allows us to connect changes in inputs through to the overall output, helping us to fine-tune the model.

Newsletter #4: Probabilities with Python

Decoding Coding • 0 implied HN points • 02 Mar 23

🕹 Technology Data science

NumPy is a powerful tool for working with probability distributions in Python. You can easily generate data and calculate probabilities using its features.
Common probability distributions like Normal, Binomial, and Poisson can be modeled using NumPy. Each distribution has its own formula to calculate probabilities.
De Morgan's Laws help in calculating probabilities of complements in events. They show how to relate the union and intersection of events, which can be useful in probability theory.

The Week of Small Language Models

Sector 6 | The Newsletter of AIM • 0 implied HN points • 22 Jul 24

🕹 Technology Data science

Small language models are gaining popularity, with companies like Hugging Face and OpenAI participating in their development. This means we could see more accessible and efficient AI tools in the near future.
Mistral AI has launched a new model called Mistral NeMo that can handle a lot of information at once, making it useful for various applications. This could help improve how we use AI in complex tasks.
There's an increasing focus on creating smaller models that still perform well, which suggests a shift in how we think about AI technology. Smaller models could make AI more practical for everyday use.

When LLMs are Super Confident 😎 ✨

Sector 6 | The Newsletter of AIM • 0 implied HN points • 19 Jul 24

🕹 Technology Data science

OpenAI is improving LLM outputs with a new technique called Prover-Verifier Games. This helps make the answers clearer and more trustworthy for users.
Smaller LLMs are taught to check the responses of larger LLMs, similar to a student explaining their homework to a tutor. This approach ensures the solutions are easy to understand.
The focus is on making LLM outputs more legible, especially in areas like grade-school math. This makes it easier for everyone to follow the reasoning behind the answers.

OpenAI is Not Open, will Safe Superintelligence be Safe?

Sector 6 | The Newsletter of AIM • 0 implied HN points • 20 Jun 24

🕹 Technology Data science

OpenAI is not as open as it claims to be, which raises questions about transparency in AI development.
Ilya Sutskever's new company focuses on developing safe superintelligence, although some may joke that if it never happens, it will always be safe.
The conversation around AI safety and superintelligence is becoming more relevant as industry leaders express concerns and start new ventures.

Inside the World’s Largest Data + AI Gathering

Sector 6 | The Newsletter of AIM • 0 implied HN points • 17 Jun 24

🕹 Technology Data science

The Databricks Data + AI Summit 2024 attracted 60,000 attendees from around the world, showing a huge interest in data and AI. There were also 16,000 people attending in person in San Francisco.
The summit featured over 600 sessions, highlighting new ideas and sharing knowledge about innovations in data and AI. It was a big event for networking and learning.
This year's focus was on making AI and data accessible, helping leaders make smarter decisions based on their data more easily.

Inside India’s Biggest Data Engineering Summit

Sector 6 | The Newsletter of AIM • 0 implied HN points • 03 Jun 24

🕹 Technology Data science

The Data Engineering Summit in Bengaluru was a huge success, with over 1,000 attendees and more than 50 speakers from the AI and analytics community.
Key topics of discussion included software deployment architectures and frameworks for using data in business, highlighting the importance of these technologies.
Attendees showed lots of enthusiasm for the discussions and innovative ideas that were shared at the event, demonstrating a vibrant interest in data engineering.

Cheese Sticking, AI Knows 🍕🤖❓

Sector 6 | The Newsletter of AIM • 0 implied HN points • 25 May 24

🕹 Technology Data science

A recent response from Google AI about cheese sticking to pizza caused a lot of debate online. It made people question how well AI understands everyday problems.
This isn't the first time AI has given strange advice. In earlier tests, it suggested weird things like drinking light-colored urine for kidney stones.
These odd suggestions highlight the gaps in AI knowledge and make us think about how we rely on technology for information.

Unfolding AlphaFold 3 🧬✨

Sector 6 | The Newsletter of AIM • 0 implied HN points • 11 May 24

🕹 Technology Data science

AlphaFold 3 is an advanced AI model that improves protein and molecule interaction predictions by 50%.
This technology goes beyond just analyzing protein structures to help design drug compounds that can bind to proteins.
The goal of this AI is to enhance drug discovery, making it easier to create effective treatments.

The Unrivalled Leader in GenAI

Sector 6 | The Newsletter of AIM • 0 implied HN points • 25 Mar 24

🕹 Technology Data science

Accenture has made a huge impact in the generative AI space, making $1.1 billion in sales which is more than all the VC-backed startups combined. This shows they are leading the way.
Compared to Accenture, major Indian tech companies like TCS and Infosys show less confidence in generative AI. They haven't reported specific earnings in this area, which raises concerns.
The difference in performance between Accenture and these Indian companies could indicate a possible risk in the outsourcing industry as they navigate new technology trends.

When (Not) to XGBoost

Sector 6 | The Newsletter of AIM • 0 implied HN points • 12 Mar 24

🕹 Technology Data science

XGBoost is a popular tool in machine learning, but it's not always the best choice for every situation. It's important to understand when to apply it and when to use other methods.
Many people now claim to be experts in AI after the rise of large language models, but AI includes a lot more than just these models.
It's essential to know the broader landscape of AI techniques to make better decisions in data science and machine learning projects.

The Week of AI Drama

Sector 6 | The Newsletter of AIM • 0 implied HN points • 11 Mar 24

🕹 Technology Data science

OpenAI has had a busy week with a lot of drama, including Sam Altman returning to its board after being fired as CEO.
Elon Musk is suing OpenAI, which adds to the tension between him and the company.
New AI models like Claude 3 and Inflection 2.5 have been released, competing directly with OpenAI's GPT-4.

Put Some Pants On! 👖👉😳

Sector 6 | The Newsletter of AIM • 0 implied HN points • 31 Jan 24

🕹 Technology Data science

LLMs, or large language models, rely on prompts to function properly, just like people choosing to dress appropriately for work. This analogy shows the importance of setting the right context for success.
Using open-source models is different from closed ones, impacting how they are packaged and function. This means the way we interact with these models, including the prompts we use, can change significantly.
A new course on prompt engineering has been released to help users navigate these differences in LLMs. It's a way for people to learn how to effectively work with these models.

Google’s Q*?

Sector 6 | The Newsletter of AIM • 0 implied HN points • 14 Dec 23

🕹 Technology Data science

Google's AlphaCode 2 has improved significantly, performing better than the earlier version by solving many coding challenges. It shows that Google's advancements in AI are making big leaps.
AlphaCode 2 ranks in the 85th percentile among competitors, meaning it outperforms most human participants in coding competitions. This suggests that AI is becoming very capable in technical problem-solving.
Many people are focused on Google's Gemini project, but AlphaCode 2 might be a game-changer in competitive coding, indicating a shift in how powerful AI tools can be for programmers.

The Cost of Using LLMs

Sector 6 | The Newsletter of AIM • 0 implied HN points • 20 Oct 23

🕹 Technology Data science

Using large language models (LLMs) can be costly, with prices influenced by factors like the number of tokens processed. For example, GPT-4 is much more expensive than other options like Llama 2.
There are many LLMs available today, with some newer open-source models like Llama 2 and Mistral 7B performing well. These models are gradually becoming more popular.
The choice of LLM depends on your specific needs and budget, as different models offer varying costs and performance levels. It's good to explore all available options before deciding.

The Real ChatGPT is Finally Here

Sector 6 | The Newsletter of AIM • 0 implied HN points • 04 Oct 23

🕹 Technology Data science

ChatGPT struggled to meet initial expectations, often giving unreliable information. Many users realized it wasn't always trustworthy after the excitement wore off.
The new GPT-4V(ision) has expanded ChatGPT's abilities, allowing it to read texts and understand images. This makes it much more versatile and useful for various tasks.
A major breakthrough is in medical science, where radiologists can now use this model to analyze images from scans better. This helps them gather important information from X-rays and other medical images.

Python Ditches GIL for AGI

Sector 6 | The Newsletter of AIM • 0 implied HN points • 01 Aug 23

🕹 Technology Data science

Python has removed the Global Interpreter Lock (GIL), which is a big change. This means Python can handle tasks more efficiently, making it better for advanced projects.
Experts believe that with GIL gone, Artificial General Intelligence (AGI) is now more achievable. This could lead to significant advancements in technology.
Python's journey began without threading support, but it added this feature early on. The removal of GIL shows how the language is evolving to meet new challenges.

Assassin GPT or Saviour GPT

Sector 6 | The Newsletter of AIM • 0 implied HN points • 12 May 23

🕹 Technology Data science

ChatGPT is impacting jobs in various fields, especially for designers, writers, and now software developers. It raises concerns about how AI might replace human roles in the workforce.
The new code interpreter plugin lets users easily get results without needing to understand complex data tools. This convenience can make it more tempting to rely solely on AI for data tasks.
The discussion around renaming ChatGPT to AssassinGPT highlights fears about its potential to disrupt industries. Some see it as a threat rather than a helpful tool.

Stop Comparing AI with A-Bomb

Sector 6 | The Newsletter of AIM • 0 implied HN points • 09 May 23

🕹 Technology Data science

Comparing AI to an atomic bomb creates unnecessary fear and limits innovation. It's important to focus on the real benefits and risks of AI without sensationalizing them.
Many critics of AI lack direct experience with machine learning, which can skew their opinions. Listening to actual AI experts is crucial for informed discussions.
Analogies like the one between AI and atomic bombs can dominate conversations and hinder progress. It's vital to steer discussions towards constructive and realistic views of AI.

Amazon Crashes the GAI Party with a Bang!

Sector 6 | The Newsletter of AIM • 0 implied HN points • 16 Apr 23

🕹 Technology Data science

Amazon was focusing on transfer learning to improve their AI, like making Alexa learn new languages. However, they recently stopped this project because it was losing a lot of money.
The company has experienced several failures in the past, showing that they are not unfamiliar with setbacks. This suggests they are trying to learn and adapt from their mistakes.
Despite their challenges, Amazon's efforts in AI and technology continue to impact the industry, making them a major player in the field.

The Quiet Storm

Sector 6 | The Newsletter of AIM • 0 implied HN points • 11 Apr 23

🕹 Technology Data science

Tech layoffs are affecting many people, and it's not just distant news; it's hitting close to home for many workers.
The economy is struggling, and signs suggest that things might get worse before they get better.
Denial won't help the situation; acknowledging the reality of layoffs and struggles is important for those affected.

LLaMA Leaked

Sector 6 | The Newsletter of AIM • 0 implied HN points • 07 Mar 23

🕹 Technology Data science

LLaMA, a new language model from Meta, has been leaked online, including its downloadable files.
The leak was first shared on 4chan and gained attention quickly on the internet.
Users can find LLaMA's models, which are smaller and efficient compared to other options, through torrent links.

Say Goodbye to Boring Data

Sector 6 | The Newsletter of AIM • 0 implied HN points • 16 Feb 23

🕹 Technology Data science

Data scarcity is a big problem for AI and machine learning. New tools like generative AI can help create more data.
Synthetic datasets can be built using techniques like Stable Diffusion. This can make data less boring and more useful for developers.
Generative AI tools can change how we approach data challenges. They offer creative solutions to improve AI development.

The AGI Blasphemy Saga Continues

Sector 6 | The Newsletter of AIM • 0 implied HN points • 15 Feb 23

🕹 Technology Data science

Yann LeCun, the Meta AI chief, prefers to go against popular trends in AI development. He does not follow the rush to create advanced chatbots like Google and Microsoft are doing.
The failure of the Galactica model has left LeCun feeling disappointed. He believes that while large language models can help with writing, they can't think or act like humans.
Despite the hype around AI models, LeCun is skeptical about their true capabilities. He highlights the gap between what these AI tools can do and what people expect from them.

Struggle Continues: Is the Cloud Giant Losing Ground?

Sector 6 | The Newsletter of AIM • 0 implied HN points • 03 Jan 23

🕹 Technology Data science

Salesforce is facing tough times with declining demand for its software. It's struggling to keep up with changes in the market.
The company's leadership is under pressure, which raises questions about its future and stability.
Investors are worried about Salesforce's valuation as it experiences a dip in performance compared to competitors.

Face-PaLM, ChatGPT 🤦

Sector 6 | The Newsletter of AIM • 0 implied HN points • 29 Dec 22

🕹 Technology Data science

Google has created a new language model called PaLM, which is much larger than OpenAI's GPT-3. PaLM has 540 billion parameters compared to GPT-3's 175 billion.
There is a growing interest in comparing who will lead the AI race, PaLM or the next versions of GPT models.
The popularity of ChatGPT is rising, creating more competition in the language model space.

AI yesterday, today, and tomorrow

Sector 6 | The Newsletter of AIM • 0 implied HN points • 27 Dec 22

🕹 Technology Data science

AI is changing fast, and businesses need to adapt quickly to keep up. It's important for companies to build their digital futures on strong AI technology.
The need for skilled AI professionals is growing, with many job opportunities in the field. Understanding AI tools and techniques can help people get ahead in their careers.
Reports like 'The State of AI in India 2022' provide valuable insights into AI trends and developments. Staying informed can help individuals and businesses navigate the evolving AI landscape.