The hottest Data Substack posts right now

And their main takeaways

🏆 What the Oscar race can teach us about AI.

New_ Public • 58 implied HN points • 05 Mar 23

🕹 Technology AI Predictions Community Data Digital Spaces

Oscar fans love predicting outcomes, leading to passion and obsession with awards.
Predictive AI tools offer joy in playing games with rules for engagement, rooted in human behavior.
Raising stakes for AI decision-making requires careful consideration and human involvement to avoid harmful consequences.

The Latest Bout of Inflation Doom-Mongering Doesn’t Add Up

Informer • 58 implied HN points • 02 Mar 23

💰 Finance Inflation Economy Federal Reserve Metrics Data

Inflation data is causing alarm and calls for stricter economic control.
Recent inflation measures have shown significant increases.
Despite concerns, questioning the severity of the current inflation situation is warranted.

Synthetic Insights With GPT-4

Addition • 58 implied HN points • 05 Apr 23

🕹 Technology AI Data Insights Automation Generative AI

Use high-quality data to ground AI in generating insights.
Show AI examples of the insights you want it to generate.
Scale the process by generating many insights and identifying the best ones.

TITAA #40.5: Everything Happens So Much in AI

Things I Think Are Awesome • 58 implied HN points • 17 Mar 23

🕹 Technology AI Data Games Art News

The post discusses the latest AI releases and developments.
The author aims to add more value for paid subscribers by providing mid-month updates.
There's a focus on reducing the length of the end-of-month roundup and staying up-to-date with news.

Column level lineage is out: AI is in

The Orchestra Data Leadership Newsletter • 39 implied HN points • 19 Dec 23

🕹 Technology AI Data Debugging Metadata Automation

Column-level lineage tools were popular in 2021 but might be replaced by AI for debugging data pipelines more efficiently.
AI models like GPT can quickly pinpoint reasons for test failures and offer actionable insights beyond what traditional lineage tools provide.
Services integrating AI with metadata can give better visibility and accurate debugging solutions for data and analytics engineers compared to column-level lineage tools.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

India's LLM Moment is Here

Sector 6 | The Newsletter of AIM • 39 implied HN points • 18 Dec 23

🕹 Technology AI Startups Data Innovation Infrastructure

Indian companies are launching new large language models (LLMs) like BharatGPT and OpenHathi, showcasing exciting developments in AI.
Ola's Krutrim is unique because it's not just using existing models but creating its own LLMs and the technology to support them from scratch.
These advancements in AI technology could have a big impact on various sectors, highlighting India's growing role in the global AI landscape.

Ronin is LIVE on Dune!

Ronin’s Newsletter • 24 implied HN points • 11 Nov 24

🔮 Crypto Blockchain Analytics Data Gaming Community

Ronin is now accessible on Dune Analytics, allowing users to analyze on-chain transactions and build dashboards for various data insights.
Creating dashboards on Dune is easy; just sign up, choose Ronin, and start building your queries to visualize data.
The Dune API lets users get real-time data updates and notifications, making it simpler for developers and analysts to track important metrics.

Everything Wrong with Mouse Studies (Kinda)

Asimov Press • 180 implied HN points • 14 Mar 23

🔬 Science Research Biology Data Medicine Animals

Many scientific results from mouse studies do not translate well to humans.
Various factors like cage location, scientist's sex, and even odors can impact mouse studies.
Considerations like using more female mice or adjusting environmental factors can improve the reliability of mouse studies.

Nobody Should Write ETL

Data People Etc. • 231 implied HN points • 23 Mar 23

🕹 Technology Data Automation Software Processes Development

Consider shifting away from manual ETL processes towards automated solutions.
End-to-end ownership can lead to duplication and inefficiency in data workflows.
Asset-aware orchestration can offer a more efficient and automated approach to managing data pipelines.

The Modern ML Stack is broken, but it won't be for long

Entry Level Investing • 184 implied HN points • 20 Feb 23

🕹 Technology AI Data Tech startups

AI infrastructure is essential for organizations to participate in the AI revolution.
The current ML infrastructure landscape is messy, and there is a need for consolidated solutions.
Entrepreneurs have a huge opportunity to build enduring businesses by focusing on end-to-end ML application offerings and addressing the challenges in the AI infrastructure space.

The LLM Race No One is Talking About

Sector 6 | The Newsletter of AIM • 39 implied HN points • 01 Dec 23

🕹 Technology AI Software Networks Data Innovation

Chinese tech companies are quietly developing powerful language models while the world focuses on popular ones like GPT-4. These new models could impact the global market significantly.
Alibaba Cloud has released several language models aimed at making AI accessible for small and medium businesses. This shows a push towards democratizing technology.
Models like Qwen-7B and Qwen-1.8B are open-source and designed for different needs, highlighting that there's a growing variety of options in the AI landscape.

Attack on Titan

Sector 6 | The Newsletter of AIM • 39 implied HN points • 30 Nov 23

🕹 Technology AI Software Innovation Internet Data

Amazon just launched a text-to-image AI model called Titan. It competes with popular models like Google's Imagen and OpenAI's DALL.E.
Titan claims to be superior in generating images, aiming for better accuracy and inclusivity. It also wants to avoid creating harmful or biased content.
It's still early to judge Titan's performance, but there are already established models in the market that have been tested.

The Most Generated Barn in America

Cybernetic Forests • 79 implied HN points • 08 Jan 23

🕹 Technology AI Ontology Images Data Models

Different names proposed before settling on 'photograph' offer unique perspectives on how people made sense of images.
AI images are not photographs, as they use light differently and inscribe ontologies onto noise using data and categories.
Ontolography, a proposed term for AI-generated images, emphasizes the domain-specific knowledge influencing their production and underlines how they are shaped by the category assignments and labels given to them.

Are AI Agentic Workflows the Future of Automation?

The API Changelog • 10 implied HN points • 30 Jan 25

🕹 Technology AI Automation APIs Workflows Data

AI agentic workflows can adapt and make decisions like humans, allowing them to handle unexpected situations in real-time. This makes them more effective than traditional automation, which often breaks down with changes.
Using APIs is essential for AI agentic workflows because they enable access to live data and help connect different services. This makes workflows smarter and more responsive to current events.
Switching to agentic workflows can reduce the maintenance costs of automation and doesn't require deep technical knowledge, making it easier for more people to implement.

Claude 3 Ignites Cloud Wars

Sector 6 | The Newsletter of AIM • 19 implied HN points • 06 Mar 24

🕹 Technology AI Cloud Software Computing Data

Claude 3 has made competition in the cloud market very intense, especially between Microsoft, Google, and Amazon. Each company is trying to outdo the others by adding new AI features.
OpenAI is under pressure to release GPT-5 as Claude 3 shows strong performance. This situation is causing some confusion for Microsoft Azure.
Anthropic's Claude 3 outperformed OpenAI's GPT-4 in several tests and is now available for businesses on platforms like Amazon Bedrock and Google Cloud. This gives businesses more options for AI tools.

What is SaaS debt?

Sarah's Newsletter • 179 implied HN points • 01 Mar 22

🕹 Technology Software Business Automation Data

SaaS debt occurs when maintaining SaaS tools involves more manual work than automated work, leading to inefficiencies and chaos.
Business teams can benefit from understanding concepts like templating, testing, and versioning to build scalable operational processes and avoid accumulating SaaS debt.
Implementing modular systems, testing processes, and versioning workflows can save time in the long run and prevent errors in operational tasks.

Unfortunately, OpenAI and Google have moats

Democratizing Automation • 174 implied HN points • 17 May 23

🕹 Technology AI Data Open Source Innovation Research

Companies like OpenAI and Google have competitive advantages known as 'moats' through data and user habits.
Creating and fine-tuning chatbots based on large language models require extensive data and resources, posing challenges for open-source development.
Consumer behavior and association biases often prevent users from switching to alternative platforms, reinforcing the dominance of tech giants like Google.

'Human intelligence'

imperfect offerings • 13 HN points • 10 Apr 24

🚌 Education AI Data Work Learning Assessment

The concept of 'artificial intelligence' has historically been used to define and value 'intelligence', leading to discriminatory practices in education and beyond.
The term 'human intelligence' has been co-opted by the AI industry to alleviate concerns about job displacement, but in reality, it devalues certain types of work and people, especially those involving care and emotional labor.
The comparison between artificial and human intelligence creates a double bind for students and workers, expecting them to conform to data-driven systems while also being 'more human', which can lead to confusion and anxiety.

Llama 2 follow-up: too much RLHF, GPU sizing, technical details

Democratizing Automation • 146 implied HN points • 21 Jul 23

🕹 Technology AI Data Programming Hardware Research

The Llama 2 model may be exhibiting trigger-happy behaviors due to excessive use of RLHF during training.
There are challenges with GPU sizing for different model variants, with considerations for inference and fine-tuning.
Meta's evaluation of the chat models reveals potential issues with model refusal rates and ensemble techniques.

What is a data product?

davidj.substack • 71 implied HN points • 15 Mar 24

🕹 Technology Data Products Interfaces Applications

A data product can take various forms and be consumed in different ways, always requiring an interface for consumption.
From raw data like CSV files to refined database tables, streams, JSON files, and ORM abstracted layers, all can be considered data products.
BI tools, AI automation, and semantic layers play crucial roles in creating consumable data products for various industries, making data more refined and accessible.

Digital & Analog Worlds. A Perfect Storm.

The Digital Anthropologist • 39 implied HN points • 27 Oct 23

🕹 Technology Culture Systems Data Digitalization Evolution

A fundamental shift is happening between the digital and analog worlds, leading to a bumpy yet inevitable collision of systems.
Throughout history, new technologies disrupt old systems, sparking a storm of change that humanity must weather and adapt to.
The clash between digital and analog gods is a reflection of the ongoing evolution of human societies, shaped by culture, technology, and the need for adaptation.

Strategies for Replication in Distributed Databases [System Design Sundays]

Technology Made Simple • 59 implied HN points • 16 Jan 23

🕹 Technology Data Databases AI Machine Learning Systems Design

Replication in distributed databases involves keeping copies of data on multiple machines spread across a network.
Benefits of replication in distributed systems include improved accessibility to data and fault tolerance.
Handling changes to replicated data involves choosing between active and passive replication methods, each with its own trade-offs.

6 things to know about IBM's WatsonX

The PhilaVerse • 123 implied HN points • 09 May 23

🕹 Technology AI Data Platforms Development Governance

IBM introduced WatsonX.ai for AI development and deployment
WatsonX.data offers a specialized data store for regulated data and AI workloads
WatsonX.governance provides tools for trustworthy AI processes

Five Links - June 2023

Five Links (and three graphs) by Auren Hoffman • 146 implied HN points • 09 Jun 23

🕹 Technology AI Podcasts Books Twitter Data

Monthly curated list of interesting links by Auren Hoffman
Includes articles on AI bias, memorization, and moral injuries
Contains bonus recommendations for podcasts, movies, and books

The Authority of the Algorithms

The Digital Anthropologist • 19 implied HN points • 12 Feb 24

🕹 Technology Algorithms Artificial Intelligence Ethics Data Society

Algorithms are deeply integrated into our daily lives, impacting everything from music to job applications, showing both benefits and risks.
Algorithms, designed by humans, are gaining authority in society, prompting questions about ethical guidelines and accountability for their creators.
Concerns about algorithms creating a bland, uniform world are present, but societal values and human creativity may prevent dystopian outcomes.

OpenAI Just Killed Thousands of Startups

Sector 6 | The Newsletter of AIM • 39 implied HN points • 29 Aug 23

🕹 Technology AI Startups Enterprise Data Business Model

OpenAI has created a new version of ChatGPT that only certain businesses can use, which means many startups that relied on this technology are now struggling.
Startups that sold products based on OpenAI's original technology are in danger as they no longer have a competitive edge.
These companies need to find new ways to stand out or they risk failing in the market.

The (Not So Subtle) Art of Not Giving A Fuck About Data

Three Data Point Thursday • 39 implied HN points • 21 Sep 23

🕹 Technology Data Data Quality Data Teams

Don't focus too much on best practices and what other companies are doing in data.
Deal with challenges and adversity in data, but focus on doing the right thing.
Prioritize data quality when your company truly becomes data-driven.

Blending AI and Human Creativity: Generative AI and Content Strategy

The Data Score • 39 implied HN points • 28 May 23

🕹 Technology AI Data Content Strategy Machine Learning

A great content strategy in the alternative data ecosystem should focus on providing validation and memorability of the data story for the audience.
When utilizing generative AI in content creation, it is essential to recognize the valuable use cases and limitations associated with this technology.
Human-in-the-loop collaboration, where AI is fine-tuned and guided by human expertise, can lead to the creation of more impactful and meaningful content.

How to choose your LLM architecture - Yes, you should have one

Three Data Point Thursday • 39 implied HN points • 06 Jul 23

🕹 Technology Data Artificial Intelligence Technology Trends

Alternative data can be used for various business problems, not just for big investments.
Choosing the right LLM architecture is crucial for creating with generative AI like OpenAI.
AI has both frightening potential, like bioterrorism, and exciting opportunities, like eradicating global poverty.

Must Learn AI Security Part 16: Impersonation Attacks Against AI

Rod’s Blog • 39 implied HN points • 25 Sep 23

🕹 Technology Security AI Cyber Attacks Data Machine Learning

Impersonation attacks against AI involve deceiving the system by pretending to be legitimate users to gain unauthorized access, control, or privileges. Robust security measures like encryption, authentication, and intrusion detection are crucial to protect AI systems from such attacks.
Types of impersonation attacks include spoofing, adversarial attacks, Sybil attacks, replay attacks, man-in-the-middle attacks, and social engineering attacks. Each type targets different aspects of the system.
To mitigate impersonation attacks against AI, organizations should implement strong security measures like authentication, encryption, access control, regular updates, and user education. Monitoring user behavior, system logs, network traffic, input and output data, and access control are essential for detecting and responding to such attacks.

Pixels are free now

Sunday Letters • 39 implied HN points • 24 Sep 23

🕹 Technology AI Digital Media Software Data Economics

The internet has made it much cheaper to share and create digital content, like images and music. This means more people can make and distribute their work easily.
AI is reducing the time and effort needed for tasks like data analysis or creative work. What used to take weeks can now be done in hours, making things more efficient.
As technology continues to evolve, we will likely rely on simple conversations with AI to create documents or applications. If it can't talk to other tools, it may soon seem outdated or 'broken'.

Localisation of Gen AI in EU

Sector 6 | The Newsletter of AIM • 39 implied HN points • 01 Sep 23

🕹 Technology AI Data Privacy Regulation Innovation

The EU has strict data protection laws that make it hard for AI tools like ChatGPT to work there. Companies have to follow these rules carefully.
European lawmakers are banning certain AI technologies, like biometric surveillance and predictive policing. This is changing how AI innovations happen in Europe.
A French company called Mistral AI recently raised a lot of money, even though they haven't launched a product yet. Their team has a lot of experience in developing advanced AI models.

Slack's Email Classification Service

Arpit’s Newsletter • 39 implied HN points • 08 Mar 23

🕹 Technology Software Data Implementation Systems Infrastructure

Slack has a feature to classify emails as internal or external during workspace invitations.
Slack uses heuristics like domain matching to classify emails, but may face challenges in diverse email domains.
Implementing a classification service involves maintaining a table with counts and eventual consistency for accurate classification.

The vector database hype explained - the story of Victor, Hector, and Lecter

Three Data Point Thursday • 39 implied HN points • 04 May 23

🕹 Technology Data Machine Learning Text generation

Vector databases allow for organizing and comparing data efficiently.
Using embedding shirts can help find similar items and recommend things.
Vector databases are key in leveraging tools like ChatGPT for general tasks due to their efficiency in organizing and retrieving information.

🥟 Chao-Down #48 Stanford pulls down their ChatGPT clone Alpaca, Github Copilot X launches with new AI features

Chaos Theory • 39 implied HN points • 23 Mar 23

🕹 Technology AI Startups Research Tech Giants Data

AI technology like ChatGPT may lead to tech monopolies due to high costs and resources.
GitHub Copilot X introduces new AI features for developers.
Mozilla launches a startup focusing on 'trustworthy' AI.

Data Leaders: Its time for more ambition

Datent • 39 implied HN points • 04 May 23

🕹 Technology Data Leaders

Data leaders should take on all legacy issues and drive enterprise transformations.
CDOs should lead efforts to migrate Excel work to cloud-based environments, like the precedent of Jeff Bezos' 'API mandate' at Amazon.
Data transformation programs should be broken down into bold phases to convince boards of the vision and drive successful change.

4/20/2023: Credit Attribution and Revenue Sharing for AI Models

The Great Reset Diary 2022- • 39 implied HN points • 21 Apr 23

🕹 Technology AI Data Models Content

Content creators should be paid for their contributions to AI models.
There is a need for a broader revenue sharing mechanism for AI models.
Credit attribution and revenue sharing are crucial in the new AI era to ensure fairness to content creators.

Generating KQL from Microsoft Sentinel Incidents with ChatGPT

Rod’s Blog • 39 implied HN points • 24 Mar 23

🕹 Technology AI Cloud Security Artificial Intelligence Data

Using ChatGPT to generate KQL queries from Microsoft Sentinel Incidents
Incorporating AI tools like ChatGPT for incident investigation and response can enhance security operations
Consider building custom logic using tools like Microsoft Sentinel Incident trigger and Open AI GPT3 Logic App connector

A Benchmark for Verifying Chain-Of-Thought

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 07 Feb 24

🕹 Technology AI Data Research Machine Learning Verification

A new dataset called REVEAL helps check if reasoning used in answers is correct or logical. It assesses whether each part of the reasoning leads to the final answer.
REVEAL focuses on verifying claims based on provided evidence. It does not check how the evidence was found, but how well the reasoning uses it.
Creating detailed datasets like REVEAL is complex and time-consuming. It requires skilled annotators to carefully evaluate the logic and relevance in each reasoning step.

The Impact of AI on the Environment: A Critical Analysis

Rod’s Blog • 19 implied HN points • 05 Feb 24

🔬 Science Environment AI Data Research Education

AI has both direct and indirect impacts on the environment. It can lead to high energy consumption and carbon emissions due to the computational complexity and rapid innovation cycle of AI systems.
The way AI is used can either help or harm the environment. It can optimize energy efficiency and support sustainable development, but it can also increase resource demand, pollution, and disrupt ecosystems.
To lessen the negative environmental effects of AI, collaborative efforts are essential. This includes implementing ethical guidelines, promoting green AI research, educating about AI's environmental impact, and incentivizing energy-efficient AI solutions.