The hottest Data Substack posts right now

And their main takeaways

Corrective RAG (CRAG)

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 05 Feb 24

🕹 Technology AI Data Machine Learning Information Retrieval Software Development

Corrective Retrieval Augmented Generation (CRAG) helps improve how data is used in language models by correcting errors from retrieved information.
It uses a special tool called a retrieval evaluator to check the quality of the data and decide if it's correct, incorrect, or unclear.
CRAG is designed to work well with different systems, making it easier to apply in various situations while enhancing document use.

The Good, the Bad, & The Ugly

Pinecone Weekly Brief • 19 implied HN points • 04 Feb 24

💰 Finance Economy Jobs Data

The post discusses recent jobs data
It includes insights from Chase Taylor
Access to the full post requires a subscription or a free trial

Data Materialization is a Convergence Problem

Data People Etc. • 159 implied HN points • 10 Apr 23

🕹 Technology Data Workflow Operations

Data materialization is not just a workflow orchestration problem but also a convergence problem.
In a convergence-based approach to data materialization, a materialization controller could continuously compare the state of the warehouse with the desired state of models to automate the materialization process.
Challenges in implementing a materialization controller include explainability, managing over-eagerness, and dealing with drift in the system.

Can GTM out hire it's data problem?

Human Capitalist • 19 implied HN points • 21 Feb 24

💼 Business Marketing Data Employment Sales Technology

Companies are changing how they think about growth. They want to be efficient and use data smarter, rather than just trying to grow for the sake of it.
There’s a big push to hire more data roles in go-to-market (GTM) teams. This is seen as important for improving things like sales and marketing efficiency.
Positions like RevOps and Chief AI Officers are becoming popular. Companies want these roles to help them run better and innovate with technology.

Fear of AI is Profitable

Cybernetic Forests • 39 implied HN points • 02 Apr 23

🕹 Technology AI Data Ethics Art Neural Networks

Fear of AI can be profitable through marketing strategies that capitalize on existential threats from AI.
There is skepticism about the narratives surrounding powerful AI systems being motivated by fear of sentient AI surpassing humans.
Prioritizing speculative future AI risks can distract from addressing the immediate impacts of AI technology on society and real-world problems.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

A few more thoughts on the future of documents

Sunday Letters • 39 implied HN points • 13 Aug 23

🕹 Technology AI Documents Data Software Innovation

Documents are changing from fixed structures to more flexible, interactive ideas. They should represent complex topics in a way that you can explore various aspects of them easily.
AI can help us create better models for understanding and interacting with information. It's like upgrading from simple numbers to more advanced ways of thinking.
In the future, documents will need to allow for meaningful interactions, not just static content. It'll feel outdated if you can't engage with documents in a dynamic way.

Fighting Ghosts With Gremlins: A Primer on Credit Card Exemptions

Money in Transit • 19 implied HN points • 16 Jan 24

💼 Business Finance Technology Data Regulation Fraud

Merchants bear the costs of credit card fraud, not cardholders.
EU regulators focus on reducing online fraud while minimizing buyer's remorse.
Merchants can request exemptions for simpler payment flows by assuming responsibility for fraud costs.

Feed AIs Enterprise Data For The Win

Perfecting Equilibrium • 19 implied HN points • 14 Jan 24

🕹 Technology AI Analytics Data Enterprise Large Language Models

Good data in leads to good analytics out.
Feeding Large Language Models clean, curated data results in valuable analytics.
Large Language Models trained on corporate data provide better insights compared to those trained on public data.

Clouded Judgement 1.3.25 - Domain Specific Models

Clouded Judgement • 7 implied HN points • 03 Jan 25

🕹 Technology AI Software Data Industry Enterprise

In 2025, we will see a lot of special AI models that focus on specific areas of knowledge, like health or engineering. These models will learn from specialized and private data to perform better than general AI models.
These domain-specific models will help industries that need deep understanding and accuracy, solving complex problems that generalized AI can struggle with. This means they can deliver the right answers when it matters most.
As businesses create their own tailored AI models, the enterprise AI market will grow significantly. This will change how companies operate and improve efficiency in many fields.

4 Engineering Slides CEOs Love (That You Can Have For Free)

Dev Interrupted • 84 implied HN points • 14 Sep 23

🕹 Technology Engineering Data Metrics Communication Presentation

Communication with your CEO is crucial for showcasing engineering progress.
CEOs value metrics that demonstrate efficiency and impact of engineering teams.
Using specific slides can help simplify and improve communication with CEOs.

Microsoft to integrate Open AI products [Finance Fridays]

Technology Made Simple • 39 implied HN points • 21 Jan 23

🕹 Technology AI Data Business Machine Learning Tech Giants

Microsoft integrating Open AI products won't instantly level the playing field against Google and Meta; Microsoft has been a strong player in Machine Learning before this integration.
Microsoft's business data from MS Office is a key advantage, but handling business data can be tricky; understanding business rules can make you valuable in AI development.
Integration of Open AI products may increase the stickiness of MS Office for existing clients, but may not attract new customers; in the long run, consulting-based revenues might increase.

I have access to Claude-3 Opus, a (seemingly) considerably more advanced model than GPT-4, ask it anything

Philosophy bear • 28 implied HN points • 05 Mar 24

🕹 Technology AI Models Artificial Intelligence Data Machine Learning

Claude-3 Opus is a highly advanced model compared to GPT-4, especially in reasoning capabilities, scoring impressively on GPQA and other tests.
The model's knowledge base is top-notch, performing as well as or better than a graduate student with Google access in specific sciences.
Questions posed to Claude-3 Opus should be challenging, aiming for queries that most people would answer correctly but the model might get wrong, to reveal its strengths and weaknesses.

Cloud, Data, AI, GenAI

Laszlo’s Newsletter • 37 implied HN points • 03 Jan 24

🕹 Technology Cloud Data AI GenAI

Cloud computing provides flexibility in resources and enables experimentation without high upfront costs.
Establishing a strong data stack is crucial before implementing AI/GenAI to ensure data quality and reliable insights.
Traditional AI involves well-defined tools for extracting business-relevant information from data, while generative AI like Prompt Engineering and Finetuning require sophisticated infrastructures and specific business goals.

Unstructured Data Unravelled

Three Data Point Thursday • 19 implied HN points • 14 Dec 23

🕹 Technology Data AI Tools Processes

Unstructured data is better understood when seen as 'complex' data.
Structured data is in the format tools can process; unstructured data needs transformation.
Focus on what you want to do with data and the cost of transforming it to the right format.

Summaries without originals

Internal exile • 29 implied HN points • 16 Feb 24

🕹 Technology AI Data Ethics Digital Interaction

Concern is rising that tech companies developing AI models may eventually run out of human-generated data to train the models, leading to a potential collapse of the models themselves.
The use of Large Language Models (LLMs), such as AI-generated text, may interfere with human intentional communication and risk creating a future where discourse is processed only by machines, wasting everyone's time.
AI technologies like LLMs can be used to manipulate power dynamics, disempower individuals, and dehumanize interactions, ultimately reshaping social relations and relegating human voices to the background.

Remix is better than GraphQL

Andrew's Substack • 13 HN points • 30 Jun 24

🕹 Technology Framework API Data Comparison Development

Remix and GraphQL serve different purposes - Remix is for full-stack app development while GraphQL is for building APIs
Both Remix and GraphQL offer benefits like type-safety and efficient data fetching
Remix loaders provide specific data loading endpoints, offering straightforward authorization and reducing opportunities for bad inputs

The Shift to Account-Driven GTM

MKT1 Newsletter • 4 implied HN points • 12 Feb 25

💼 Business Marketing Sales Strategy Startups Data

Companies need to switch to an account-driven approach for marketing and sales. This means focusing on specific accounts instead of just waiting for leads to come in.
New tools now let marketers understand their entire audience better. They can gather more data on accounts, allowing for more tailored outreach and personalized content.
This shift requires teamwork across departments like marketing, sales, and customer success. Everyone has to work together to effectively target and engage with chosen accounts.

June 17, 2024

The Grasp • 3 HN points • 17 Jun 24

🕹 Technology Robots AI Data Models Startups

Stanford's new research simplifies training humanoid robots using human body and hand poses, revolutionizing data collection for robot learning.
The open-source Vision-Language-Action model, OpenVLA, showcases improved robotic control and performance, highlighting the benefits of collaborative industry contributions.
Harvard and Deepmind's study on virtual rodent brain activity provides insights into brain-controlled motion, with potential implications for brain-machine interfaces and robotics.

LangChain & LLM Based Autonomous Agents

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 26 Apr 23

🕹 Technology AI Software Automation Data Applications

Large Language Models (LLMs) can be programmed with reusable prompts. This helps in integrating them into bigger applications easily.
Creating chains of interactions allows LLMs to work together in a structured way for more complex tasks.
Agents can operate independently, using tools to find answers without being stuck to a fixed plan, making them more flexible.

Fun and Hackable Tensors in Rust, From Scratch

Get Code • 70 implied HN points • 01 May 23

🕹 Technology Software Data

Deep dive into tensor operations using Rust's Tensorken library.
Matrix multiplication can be built with basic elementwise operations like broadcasting and summation.
Improvement possibilities in Tensorken include error handling, slicing API enhancements, and efficiency optimizations.

Proposal for improving the global online discourse through personalised comment ordering on all websites

Engineering Ideas • 19 implied HN points • 07 Dec 23

🕹 Technology Social media AI Data Blockchain Privacy

Social media promotes tribalism and polarization, making it hard to find rational critique in comments.
A proposed solution involves personalized comment ordering based on user reactions and models.
Compensating users for reading and voting on comments with a token system could help combat spam and manipulation.

Is Big Tech using Data Laundering to cheat artists?[Storytime Saturdays]

Technology Made Simple • 39 implied HN points • 21 Nov 22

🕹 Technology Art AI Data Copyright Ethics

Data Laundering involves converting stolen data to make it seem legitimate for different uses.
Big Tech companies use non-profits to create datasets/models for research, then monetize them into APIs without compensating artists.
There is a double standard between how Tech companies treat music and visual art, with considerations about replicating music, copyright standards, and the ethical aspects of compensation.

Clouded Judgement 1.24.25 - The Year of Enterprise AI

Clouded Judgement • 4 implied HN points • 24 Jan 25

🕹 Technology AI Software Data Innovation Business

AI in businesses faces a big challenge called the 'last mile' problem, which means it struggles to give accurate answers for specific business needs. This is especially important when customers are involved.
To make AI better for businesses, combining general AI models with specific company data helps create more reliable results. This approach can improve things like compliance checks and sales forecasts.
The speed of improvement in AI technology is impressive, and future models might overcome current limitations. This could allow businesses to answer a wider range of questions more accurately.

Are Only 20% of Devs Happy? | Stack Overflow’s Erin Yepis

Dev Interrupted • 9 implied HN points • 19 Nov 24

🕹 Technology Software Engineering Data Productivity

Only about 20% of developers say they are happy in their jobs. This suggests many people in the field are feeling dissatisfied.
Factors like low pay, workplace culture, and issues with technical debt are major reasons behind this unhappiness. It's important to look at these issues to help improve developer satisfaction.
A new project called Flock aims to address problems with the popular Flutter toolkit. The creators want to make a community-driven platform that fixes bugs and speeds up development.

Must Learn AI Security Epilogue: Securing AI is a Three-Pronged Approach

Rod’s Blog • 19 implied HN points • 25 Oct 23

🕹 Technology AI Security Code Data Access

Securing AI involves three main aspects: secure code, secure data, and secure access. It is crucial to ensure that AI systems are free of errors, vulnerabilities, and malicious components.
Developers and users should follow practices like code review, testing, data encryption, and authentication to mitigate threats such as code injections, data poisoning, unauthorized access, and denial of service.
The shared responsibility model defines security tasks handled by AI providers and users. It is important to understand the responsibility distribution between the provider and the user based on the type of AI deployment, such as SaaS, PaaS, or IaaS.

2022 Trends in Data and AI

Gradient Flow • 99 implied HN points • 06 Jan 22

🕹 Technology Data AI Machine Learning Podcasts

Graph Intelligence is a rising technology category for analyzing data relationships, using techniques like graph visualization and machine learning models.
Early adopters of Graph Intelligence might gain a competitive advantage in analyzing data more efficiently and effectively.
Podcasts like Data Exchange discuss topics like data and machine learning platforms at Shopify, AI engineering, and the importance of a modern metadata platform.

Welcome to the Era of AI Factories

Sector 6 | The Newsletter of AIM • 19 implied HN points • 19 Oct 23

🕹 Technology AI Manufacturing Data Electric Vehicles Partnerships

AI factories are big data centers that use powerful computers to turn data into useful insights. They are changing how manufacturing works around the world.
Foxconn is teaming up with NVIDIA to create these AI factories, which will also support new technologies like electric and self-driving cars.
This partnership is a step towards making processes faster and smarter, showing how AI can improve modern manufacturing.

Truth in Inconvenience

Breaking Smart • 90 implied HN points • 25 Feb 23

🕹 Technology AI Data Engineering Knowledge Reality

Real-world friction connects big zeitgeist things and teaches about truth in inconvenience.
Meccano vs Lego: Meccano models offer higher realism, messiness and inconveniences, while Legos offer convenience and smoothness.
AI entering the world may encounter a real, high-interest world like a Meccano world, where knowledge shock requires adjusting ambitions to balance design knowledge and friction knowledge.

Welcome Bikky to the Equal Ventures Portfolio

Equal Ventures • 39 implied HN points • 12 Sep 22

💼 Business Startups Technology Data Investment Restaurants

Equal Ventures is partnering with Bikky, a Customer Data Platform for the restaurant industry.
The digitization of the restaurant industry has created a need for solutions like Bikky that unify customer data across channels.
Bikky's founder, Abhinav Kapur, identified the need for a vertically focused solution through his personal experience in the restaurant industry.

What is Bayesian Statistics? The Beginner Math Guide (Part Four Final)

The Software & Data Spectrum • 19 implied HN points • 23 Apr 23

🔬 Science Statistics Bayesian Data

Posterior probability reflects our belief in a hypothesis after analyzing data.
Bayes Factor compares likelihood of two hypotheses to explain observed data.
Prior odds influence how data convinces us about different hypotheses.

How to become a data business

Three Data Point Thursday • 19 implied HN points • 20 Jul 23

🕹 Technology AI Data Business Products Innovation

The key to becoming a data business is focusing on smart products over improving business decisions.
When integrating AI into products, focus on solving bigger problems for customers instead of just improving efficiency.
Artificial Intelligence can accumulate knowledge without the burden that humans face, leading to unpredictable situations.

Ground-truth-in-the-loop

Yuxi’s Substack • 19 implied HN points • 18 Jul 23

🕹 Technology AI Machine Learning Data Systems Models

Ground-truth-in-the-loop is crucial for designing and evaluating systems, especially in AI and machine learning.
For AI systems, having trustworthy training data, evaluation feedback, and a reliable world model is essential.
Researchers should inform non-experts about limitations and potential issues when building systems without ground-truth.

Everything you need to know about geometric deep learning

Three Data Point Thursday • 19 implied HN points • 22 Jun 23

🕹 Technology Data Machine Learning Deep Learning Algorithms Alternative Data

You should be using alternative data.
Avoid using geometric deep learning unless you're a data entrepreneur.
If you're already building something, flatten your data instead of using GDL.

Making sense of Mineral.ai

Agribusiness Matters • 19 implied HN points • 23 Mar 23

💼 Business Tech Agriculture Data Innovation Podcasts

Conway's Law states that an organization's system design mirrors its communication structure.
Market conditions in agriculture shape organizations and products specific to that domain.
Mineral.ai evolved from an agriculture-tech project at Google X to a company addressing computational agriculture at scale.

Part 2: What goes wrong when business, finance, data, and technology worlds collide?

The Data Score • 19 implied HN points • 16 Aug 23

💼 Business Finance Data Technology Communication Decision-making

Silos and problems in business, finance, data, and technology worlds are mostly self-contained and are becoming more complex over time
Challenges arise when experts talk past each other, fall into the 'smartest person in the room' syndrome, and fear failure in collaborative projects
Successful collaboration requires effective communication, empathy, and psychological safety to navigate jargon, unstated motivations, and pressure of high stakes

AI for The Masses: How Small Businesses Can Get in on the AI Revolution

aidaily • 19 implied HN points • 10 Jul 23

🕹 Technology AI Startups Ethics Tools Data

Small businesses can now access AI technology through partnerships with big tech companies like Google, Amazon, and Microsoft.
The sustainable growth of AI technology requires careful management to ensure societal benefits and ethical use.
AI is a powerful tool with potential for both good and misuse, emphasizing the importance of using it responsibly.

How I Teach Algebraic Data Types

Stefan’s Substack • 19 implied HN points • 23 Mar 23

🕹 Technology Programming Types Data

Start teaching algebraic data types by explaining enums in languages like C or Java and then showing how to write an enum in Haskell.
Introduce the concept of constructors in algebraic data types using a day-of-week datatype as a simple starting point.
Explain sum types and product types as the basic building blocks to create more complex algebraic data types by combining both concepts.

Stay ahead of the curve with AI Promptly.

aidaily • 19 implied HN points • 20 Apr 23

🕹 Technology AI Software Data Tech Tools Machine Learning

AI Promptly is rebranded to AI Promptly with major plans for first-access resources
Google is updating its search engine to compete with AI-powered rivals
AI is revolutionizing various industries like healthcare and law

(My) shallow thoughts about deep learning, II

Silicon Reckoner • 19 implied HN points • 01 Jul 23

🕹 Technology AI Data Mathematics Ethics Collaboration

Insights about the tech industry and its focus on profit over societal benefit
Importance of open-source collaboration and the implications of centralization in AI research
Challenges in building interdisciplinary communities and the evolving nature of mathematics towards computing

The hottest Data Substack posts right now

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 05 Feb 24

Pinecone Weekly Brief • 19 implied HN points • 04 Feb 24

Data People Etc. • 159 implied HN points • 10 Apr 23

Human Capitalist • 19 implied HN points • 21 Feb 24

Cybernetic Forests • 39 implied HN points • 02 Apr 23

Sunday Letters • 39 implied HN points • 13 Aug 23

Three Data Point Thursday • 19 implied HN points • 25 Jan 24

Money in Transit • 19 implied HN points • 16 Jan 24

Perfecting Equilibrium • 19 implied HN points • 14 Jan 24

Clouded Judgement • 7 implied HN points • 03 Jan 25

Dev Interrupted • 84 implied HN points • 14 Sep 23

Technology Made Simple • 39 implied HN points • 21 Jan 23

Philosophy bear • 28 implied HN points • 05 Mar 24

Laszlo’s Newsletter • 37 implied HN points • 03 Jan 24

Three Data Point Thursday • 19 implied HN points • 14 Dec 23

Internal exile • 29 implied HN points • 16 Feb 24

Andrew's Substack • 13 HN points • 30 Jun 24

MKT1 Newsletter • 4 implied HN points • 12 Feb 25

The Grasp • 3 HN points • 17 Jun 24

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 26 Apr 23

Get Code • 70 implied HN points • 01 May 23

Engineering Ideas • 19 implied HN points • 07 Dec 23

Technology Made Simple • 39 implied HN points • 21 Nov 22

Clouded Judgement • 4 implied HN points • 24 Jan 25

Dev Interrupted • 9 implied HN points • 19 Nov 24

Rod’s Blog • 19 implied HN points • 25 Oct 23

Gradient Flow • 99 implied HN points • 06 Jan 22

Sector 6 | The Newsletter of AIM • 19 implied HN points • 19 Oct 23

Breaking Smart • 90 implied HN points • 25 Feb 23

Equal Ventures • 39 implied HN points • 12 Sep 22

The Software & Data Spectrum • 19 implied HN points • 23 Apr 23

Three Data Point Thursday • 19 implied HN points • 20 Jul 23

Yuxi’s Substack • 19 implied HN points • 18 Jul 23

Three Data Point Thursday • 19 implied HN points • 22 Jun 23

Agribusiness Matters • 19 implied HN points • 23 Mar 23

The Data Score • 19 implied HN points • 16 Aug 23

aidaily • 19 implied HN points • 10 Jul 23

Stefan’s Substack • 19 implied HN points • 23 Mar 23

aidaily • 19 implied HN points • 20 Apr 23

Silicon Reckoner • 19 implied HN points • 01 Jul 23