The hottest Databases Substack posts right now

And their main takeaways

Do you even need Kafka?

Data Streaming Journey • 79 implied HN points • 28 Oct 24

Kafka and similar tools are still relevant and necessary for effective data streaming today. They help handle large amounts of data quickly and reliably.
Modern alternatives to Kafka, like Materialize and Debezium, simplify the process of working with operational data and make it easier to integrate with other tools.
Even if you only want to move data from a database to a data warehouse, using a streaming platform can benefit larger enterprises by making data integration more efficient.

Postgres in a box

benn.substack • 920 implied HN points • 06 Dec 24

🕹 Technology Databases Cloud Computing Software AI Data Management

Software has changed from being sold in boxes in stores to being bought as subscriptions online. This makes it easier and cheaper for businesses to manage.
The new trend is separating storage from computing in databases. This lets companies save money by only paying for the data they actually use and the calculations they perform.
There's a push towards making data from different sources easily accessible, so you can use various tools without being trapped in one system. This could streamline how businesses work with their data.

PostgresConf Seattle: Price increases 3/4/24

PostgresWorld and Postgres Conference • 19 implied HN points • 23 Oct 24

🕹 Technology Software Conferences Databases Networking Community

Register for PostgresConf Seattle before March 4, 2024, to save money. The early bird price is $599.
After March 4, the price of registration will rise significantly, starting at $995.
The conference includes over 50 sessions and various community events that provide great networking opportunities.

Diving Deep into LinkedIn's Data Infrastructure: My 6-Hour Learning & Key Takeaways

VuTrinh. • 299 implied HN points • 03 Aug 24

🕹 Technology Data Engineering Software Architecture Databases Distributed Systems Cloud Computing

LinkedIn's data infrastructure is organized into three main tiers: data, service, and display. This setup helps the system to scale easily without moving data around.
Voldemort is LinkedIn's key-value store that efficiently handles high-traffic queries and allows easy scaling by adding new nodes without downtime.
Databus is a change data capture system that keeps LinkedIn's databases synchronized across applications, allowing for quick updates and consistent data flow.

Missing From the Databases: Glutamine

Harnessing the Power of Nutrients • 1757 implied HN points • 27 Dec 23

🏥 Health & Wellness Nutrition Databases Biochemistry Diet Supplementation

Glutamine is crucial for specific health conditions like illness, injury, surgery, and certain dietary needs.
Nutritional databases lack accurate information on glutamine content because of flawed measurement methods, impacting the understanding of amino acid composition in foods.
Animal proteins like leg meat of chicken and pork are high in glutamine, while dairy proteins are intermediate, and plant proteins have varying levels, highlighting the importance of diversifying protein sources for glutamine intake.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

($) 2025: Year of (Software) Picks and Shovels

Interconnected • 92 implied HN points • 22 Dec 24

🕹 Technology AI Software Infrastructure Investing Databases

In 2025, software tools like API platforms, databases, and GPU clouds will be key for AI applications. They are becoming just as important as hardware for building AI solutions.
The focus on AI is shifting from just hardware to include software infrastructure that supports the creation of smarter, more useful AI agents.
Investors should pay attention to emerging software tools and platforms as they will drive the next wave of innovation in AI. Recognizing which ones will succeed is crucial.

sqlmesh init -t dlt --dlt-pipeline bluesky duckdb

davidj.substack • 71 implied HN points • 05 Dec 24

🕹 Technology Software Data Engineering APIs Databases Automation

Using dlt to work with Bluesky API allows for easy data extraction. It saves time by handling metadata and schema changes automatically.
dlt simplifies dealing with nested data by creating separate tables. This makes it easier to manage complex data structures.
sqlmesh can quickly generate SQL models based on dlt pipelines. This feature streamlines the workflow and reduces manual setup time.

Intro to SQL Indexes

Data Engineering Central • 589 implied HN points • 17 Jan 24

🕹 Technology Databases Indexes Performance Analytics

Indexes are crucial for improving performance in SQL operations and data access.
Clustered and non-clustered indexes are the two main types to understand in SQL indexing.
Understanding use cases and query access patterns is key to designing effective indexes for data warehouses.

sqlmesh model kinds - 1

davidj.substack • 59 implied HN points • 06 Dec 24

🕹 Technology Data Modeling Software Development APIs Machine Learning Databases

There are different types of models in sqlmesh, such as full, view, and embedded models, each having unique functions and uses. It's important to choose the right model type based on how fresh or how often you need the data.
SCD Type 2 models are useful for managing records that change over time, as they track the history of changes. This can make analyzing data trends much easier and faster.
External models in sqlmesh allow you to reference database objects not managed by your project. This can simplify data modeling and documentation, as they automatically gather useful metadata.

Scaling Zerodha's Reporting System through 7 million PostgreSQL tables

Engineering At Scale • 15 implied HN points • 09 Jan 25

🕹 Technology Software Databases Architecture Innovation User Experience

Zerodha created an innovative system with 7 million PostgreSQL tables to handle user reporting requests efficiently. This solution tackled issues with slow queries and poor user experiences during busy periods.
They switched from a synchronous to an asynchronous model, allowing users to submit requests and check back later for results. This change improved the overall user experience significantly.
The new architecture involved using a temporary database to handle queries and storing results in many tables. While it works well for now, they might need to consider other solutions if user growth continues rapidly.

The Sequence Engineering #488: Txtai, Maybe the Simplest Way to do Embeddings

TheSequence • 63 implied HN points • 12 Feb 25

🕹 Technology AI Software Open Source Databases Development

Embeddings are important for generative AI applications because they help with understanding and processing data. A good embedding framework should be simple and easy for developers to use.
Txtai is an open-source database that combines different tools to make working with embeddings easier. It allows for semantic search and supports creating various AI applications.
This framework can help build advanced systems like autonomous agents and search tools, making it a versatile choice for developers creating LLM apps.

July 2024 update

Opral (lix & inlang) • 19 implied HN points • 06 Aug 24

🕹 Technology Software Development Web Databases Innovation

The team is moving quickly with rewriting inlang and lix using SQLite instead of git. This change is expected to speed things up a lot.
The release date for the new version is coming at the end of August, so we don't have to wait long.
Lix aims to become a social network where people can share various kinds of their work, like music, video, or design projects.

Accelerate by years part IV - The prototype

Opral (lix & inlang) • 19 implied HN points • 23 Jul 24

🕹 Technology Software Development Databases APIs Programming Tech Innovation

Using SQLite can really speed up the development of both inlang and lix. This saves a lot of time on needing to create complex systems.
Lix 1.0 is coming soon, with simple plugins that can manage changes easily. This makes it easy for apps to work with changes directly.
The next steps involve building a user interface for merging data and creating a plugin for inlang. This should help make the system more efficient.

Accelerate by years part III - Lix on SQLite

Opral (lix & inlang) • 19 implied HN points • 23 Jul 24

🕹 Technology Software Development Databases Programming Innovation Tech Tools

Building lix without relying on Git can simplify the process. This means avoiding the complications that come with Git's file-based storage model.
Using SQLite for storing data will solve many problems like concurrency and data integrity. It makes it easier to manage application data compared to handling everything through Git.
The main requirements for lix 1.0 will be a merging function and a plugin for inlang. This will open up opportunities for third-party developers to create new lix applications.

Database Wars, Insane SQLite Facts, Static Search Trees Optimized, & More

HackerPulse Dispatch • 8 implied HN points • 07 Jan 25

🕹 Technology Databases AI Engineering Software Performance

Static search trees are great for quick data searching. They are built for data that doesn't change much, making them much faster than regular search methods.
AI can't build strong engineering teams on its own. Engineers need to take action and push for programs that help train and mentor new hires.
SQLite is a super popular database used by millions, but it's managed by just a small team. Its simplicity and reliability make it a favorite for many applications.

Engineers want the ergonomics of SQL query languages. So why do NoSQL databases exist?

🔮 Crafting Tech Teams • 59 implied HN points • 18 Apr 24

🕹 Technology Databases SQL Cloud

Engineers desire the user-friendly nature of SQL for query languages.
NoSQL databases exist due to the different needs and structures of certain applications.
SQL is expanding to various technologies like Kafka, Clickhouse, Elasticsearch, and more.

What's a vector database?

Technically • 34 implied HN points • 21 Oct 24

🕹 Technology AI Data science Machine Learning Software Development Databases

A vector database is a special storage for data used in AI. It helps store numbers that represent different types of information like text or images.
To make AI models smarter, they need to use unique data from companies. This helps tailor responses and improve accuracy.
There are ways to enhance AI models with unique data, like fine-tuning them or using a method called Retrieval Augmented Generation (RAG) to include important information in prompts.

How Vector Databases enable Generative AI [Math Mondays]

Technology Made Simple • 199 implied HN points • 06 Jun 23

🕹 Technology Programming Databases AI Software Engineering Math

Vector databases store data as high-dimensional vectors to enable advanced AI like Gen AI.
Vectors are crucial for AI applications like language processing, computer vision, and recommendation systems.
Vector databases offer flexibility in handling complex datasets, allowing AI models to interact more effectively.

Console #163 -- Top Open Source projects of the week 🔥

Console • 472 implied HN points • 25 Jun 23

🕹 Technology Open Source Databases Statistics Rust

EdgeDB is a new type of database combining features of relational databases, graph databases, and ORMs.
Lyon focuses on 2D graphics rendering on the GPU in Rust using path tessellation.
Simple Statistics provides statistical methods in readable JavaScript for various platforms.

What I Learned This Week #1

Eventually Consistent • 39 implied HN points • 06 May 24

🕹 Technology Databases Programming Podcasts Social media

ScyllaDB introduces a shard per core design, maximizing parallelism by assigning a separate shard to each core.
FoundationDB bridges SQL and NoSQL, offering ACID transactions with schema flexibility and performance.
Compilers like Clang and language servers like Clangd have separate purposes; language servers follow the Language Server Protocol for portability.

Creating Your First Change Request with Sort

Database Engineering by Sort • 7 implied HN points • 18 Dec 24

🕹 Technology Software Databases Workflows Engineering Data Management

Sort helps you manage database changes easily and safely, like how GitHub handles changes. You can propose changes without altering the data right away.
Creating a Change Request is simple. Just suggest what you want to change and set it up for review by others in your organization.
Once a Change Request is approved, it can be applied without hassle. If anything goes wrong during the process, Sort can automatically roll back the changes.

System Design - Scale 0 to Millions of Users

Better Engineers • 7 HN points • 31 Jul 24

🕹 Technology Systems Design Databases Scaling

Scaling systems to handle millions of users involves understanding how to make systems work better under pressure. This can be done by adding more resources or managing them effectively.
Vertical scaling means adding more power (like RAM or CPU) to existing servers, while horizontal scaling means adding more servers to share the load. Horizontal scaling is often better for high traffic situations.
Using a master-slave database setup helps balance loads and keeps data safe. If one database fails, another can take over, ensuring the system runs smoothly and reliably.

HCF EP 004: Indexing and Querying

Hasen Judi • 35 implied HN points • 13 Dec 24

🕹 Technology Software Programming Databases Web Development APIs

You can create a simple forum with posts that track who made them and when. Each post can include basic content, like a Tweet.
Using indexes helps you quickly find posts by user or hashtags. This makes searching through posts much faster and easier.
Automated testing is a great way to ensure everything works as expected without needing to manually check each part of your code.

Top 5 Vector Databases You Need To Know

The Beep • 39 implied HN points • 18 Feb 24

🕹 Technology Databases Software AI Open Source Cloud Services

Vector databases help improve how machines understand and respond to queries by providing more context. This makes it easier to get accurate answers to questions.
There are different kinds of vector databases, like self-hosted and managed. Self-hosted requires more work to maintain, while managed ones are easier and quicker to set up.
Choosing the right vector database depends on your needs like price, scalability, and the specific features you require for your application. It's important to test them to see which one fits best.

Why Discord ditched Cassandra [System Design Sundays]

Technology Made Simple • 79 implied HN points • 03 Apr 23

🕹 Technology System Design Databases Data Management Programming Tech Education

Discord faced performance issues with Cassandra, requiring increasing maintenance effort and leading to unpredictable latency.
Hot partitions were a problem in Cassandra, causing hotspotting and impacting the database's performance during concurrent reads.
Garbage collection in Cassandra posed challenges, leading Discord to switch to ScyllaDB which does not have a garbage collector.

Why do databases store data in B+ trees?

Arpit’s Newsletter • 78 implied HN points • 29 Mar 23

🕹 Technology Databases Data Storage

SQL databases use B+ trees for efficient data operations like insert, update, find, and delete.
Storing data in sequential files can lead to inefficiencies for database operations.
B+ trees enable efficient CRUD operations and range queries with time complexity O(log n).

(3/3) "Relational Database Management: A Status Report" [Chris Date, 1983]

Minimal Modeling • 202 implied HN points • 12 May 23

🕹 Technology Databases Optimization Data Models Research Programming Languages

Relational database systems still struggle with supporting primary keys fully
Foreign key support is crucial for maintaining database integrity
Automatic physical design aids are important for optimizing database performance

The Stupid Programmer Manifesto

Hasen Judi • 206 HN points • 15 Jun 23

🕹 Technology Programming Software Development Web Development Cloud Services Databases

The author embraces being a 'stupid' programmer and finds simplicity in development.
They use basic tools and methods due to feeling inadequate for modern web development.
Simplicity and straightforward approaches are preferred over complex frameworks and technologies.

Redis Changes Every few Years - Are you up to date?

🔮 Crafting Tech Teams • 59 implied HN points • 01 Jun 23

🕹 Technology Programming Databases Tech Trends

Redis has evolved beyond just a cache and can be used for various purposes like PubSub notifiers, search DB, and event storage.
Postgres, known as an SQL DB, can also be utilized as an event store, message queue, outbox, or document db, showcasing the versatility of technologies.
It's essential to stay up to date with how technologies like Redis are changing over the years to make the most of their capabilities.

Links and composite keys

Minimal Modeling • 101 implied HN points • 09 Aug 23

🕹 Technology Databases Modeling Data Integrity

Consider using unique constraints for composite keys to ensure data integrity
Splitting tables can be a useful exercise for a cleaner data model
Primary keys serve as both a uniqueness constraint and an identity marker in a table

Strategies for Replication in Distributed Databases [System Design Sundays]

Technology Made Simple • 59 implied HN points • 16 Jan 23

🕹 Technology Data Databases AI Machine Learning Systems Design

Replication in distributed databases involves keeping copies of data on multiple machines spread across a network.
Benefits of replication in distributed systems include improved accessibility to data and fault tolerance.
Handling changes to replicated data involves choosing between active and passive replication methods, each with its own trade-offs.

Understanding The Role of Vector DB in AI Application

The Beep • 19 implied HN points • 04 Feb 24

🕹 Technology AI Databases Software Applications Data science

Vector databases are designed to handle complex and unstructured data, making them great for AI applications like semantic search and face recognition. They convert information into high-dimensional vectors that are easy to work with.
Unlike traditional databases, vector databases can manage different types of data such as text, images, and audio, which makes them very versatile. They're like a Swiss Army knife for managing data.
Vector databases play a crucial role in enhancing AI capabilities, providing better access and analysis of data, which leads to smarter applications, including smart assistants and more.

I made 1+1=0 in DuckDB

VuTrinh. • 19 implied HN points • 03 Feb 24

🕹 Technology Data Engineering Databases Analytics Software Development Programming

DuckDB is easy to use because it works like SQLite, running directly inside applications without needing a separate server. This makes it simpler to manage.
It processes data in batches through vectorization, which means it can handle multiple records at once, making operations faster than traditional row-by-row processing.
DuckDB supports ACID transactions, ensuring that data remains safe and reliable, which is important in data analytics and shared environments.

(1/3) “Relational Database Management: A Status Report” [Chris Date, 1983]

Minimal Modeling • 101 implied HN points • 10 May 23

🕹 Technology Databases Programming Data Management Database Systems

The video discusses the historical background of relational databases, starting in 1983.
Key points include the slow process of database system installation and the importance of primary keys in database design.
Discussion on relational operations like join and divide, emphasizing the significance of these operations in practical database management.

PostGres is enough, but is it fast to code?

CodeFaster • 36 implied HN points • 06 Feb 24

🕹 Technology Databases Coding APIs Data Operations

PostgreSQL can handle all your data.
PostgreSQL can handle all your data operations.
Using PostgreSQL for both data and operations may have benefits for fast coders.

Productivity Explosion

Dana Blankenhorn: Facing the Future • 39 implied HN points • 17 Aug 23

🕹 Technology Productivity AI Databases Machine Learning Adaptation

Processes must be changed for serious business productivity gains.
Garbage In, Garbage Out - databases need to accurately describe reality for useful outputs.
Adaptation to change leads to productivity gains and is a key factor in success.

An Incomplete Look at Vulnerability Databases & Scoring Methodologies

Resilient Cyber • 59 implied HN points • 22 Nov 22

🕹 Technology Cybersecurity Software Databases Vulnerabilities Scoring

Vulnerability databases like CVE and NVD help identify and score software weaknesses. This scoring helps companies prioritize what to fix first to keep users safe.
The Common Vulnerability Scoring System (CVSS) rates how severe a vulnerability is. This helps organizations understand the impact and urgency of addressing the risk.
New systems like the Open-Source Vulnerabilities (OSV) database and Global Security Database (GSD) aim to improve how vulnerabilities are recorded and shared, making it easier for developers to manage risk.

Fine-tune an open source LLM from Postgres data in 5 minutes

Database Engineering by Sort • 15 implied HN points • 27 Mar 24

🕹 Technology AI Databases Programming Open Source Machine Learning

Fine-tuning an open source language model is now super easy and can be done in just five minutes. This makes it accessible for more people to customize LLMs for their needs.
You can use data from a Postgres database to create a product catalog that the fine-tuned LLM can answer questions about. This can help with tasks like customer support and product information.
With tools like Together.ai, you can quickly set up fine-tuning and chat with your customized LLM. It's great for building chatbots and enhancing user interactions.

Working at Google: Maps

Life Since the Baby Boom • 63 HN points • 13 Feb 23

🕹 Technology Software Management Databases User Experience Product Development

The author worked at Google Maps and shares experiences from their time there.
Google Maps has a feature called My Maps for creating customized maps.
There were challenges and changes in management that impacted the team's work on Google Maps.

The hottest Databases Substack posts right now

Data Streaming Journey • 79 implied HN points • 28 Oct 24

benn.substack • 920 implied HN points • 06 Dec 24

PostgresWorld and Postgres Conference • 19 implied HN points • 23 Oct 24

VuTrinh. • 299 implied HN points • 03 Aug 24

Harnessing the Power of Nutrients • 1757 implied HN points • 27 Dec 23

Interconnected • 92 implied HN points • 22 Dec 24

davidj.substack • 71 implied HN points • 05 Dec 24

Data Engineering Central • 589 implied HN points • 17 Jan 24

davidj.substack • 59 implied HN points • 06 Dec 24

Engineering At Scale • 15 implied HN points • 09 Jan 25

TheSequence • 63 implied HN points • 12 Feb 25

Opral (lix & inlang) • 19 implied HN points • 06 Aug 24

Opral (lix & inlang) • 19 implied HN points • 23 Jul 24

Opral (lix & inlang) • 19 implied HN points • 23 Jul 24

HackerPulse Dispatch • 8 implied HN points • 07 Jan 25

🔮 Crafting Tech Teams • 59 implied HN points • 18 Apr 24

Technically • 34 implied HN points • 21 Oct 24

Technology Made Simple • 199 implied HN points • 06 Jun 23

Console • 472 implied HN points • 25 Jun 23

Eventually Consistent • 39 implied HN points • 06 May 24

Database Engineering by Sort • 7 implied HN points • 18 Dec 24

Better Engineers • 7 HN points • 31 Jul 24

Hasen Judi • 35 implied HN points • 13 Dec 24

The Beep • 39 implied HN points • 18 Feb 24

Technology Made Simple • 79 implied HN points • 03 Apr 23

Arpit’s Newsletter • 78 implied HN points • 29 Mar 23

Minimal Modeling • 202 implied HN points • 12 May 23

Hasen Judi • 206 HN points • 15 Jun 23

🔮 Crafting Tech Teams • 59 implied HN points • 01 Jun 23

Minimal Modeling • 101 implied HN points • 09 Aug 23

Technology Made Simple • 59 implied HN points • 16 Jan 23

The Beep • 19 implied HN points • 04 Feb 24

VuTrinh. • 19 implied HN points • 03 Feb 24

Minimal Modeling • 101 implied HN points • 10 May 23

CodeFaster • 36 implied HN points • 06 Feb 24

Dana Blankenhorn: Facing the Future • 39 implied HN points • 17 Aug 23

Resilient Cyber • 59 implied HN points • 22 Nov 22

Three Data Point Thursday • 19 implied HN points • 01 Jun 23

Database Engineering by Sort • 15 implied HN points • 27 Mar 24

Life Since the Baby Boom • 63 HN points • 13 Feb 23