The hottest Databases Substack posts right now

And their main takeaways

Running Thrudb On Amazon EC2

Thái | Hacker | Kỹ sư tin tặc • 0 implied HN points • 20 Mar 08

🕹 Technology Databases

When running Thrudb on Amazon EC2, check the provided AMIs for the latest version of Thrudb.
Follow the Amazon EC2 documentation to start the instance and then start Thrudb on it using specific commands.
Updating Thrift and Thrudb to the latest versions is recommended due to active development.

PostgreSQL Sort estimation instability

Conserving CPU's cycles ... • 0 implied HN points • 26 Jun 24

🕹 Technology Databases

Incremental sort was added in PostgreSQL 2020 to enhance sorting strategies and improve efficiency in handling large datasets and analytical queries.
Estimation instability in PostgreSQL's sort operations can lead to unexpected query plans and performance differences, emphasizing the importance of careful estimation.
The vulnerability in PostgreSQL's optimizer code showcases how the choice of expression evaluation can impact query performance, highlighting a need for optimization improvements.

MSSQL query plan optimisation advantages

Conserving CPU's cycles ... • 0 implied HN points • 21 May 24

🕹 Technology Databases

In MSSQL to PostgreSQL migrations, challenges like query slowdowns may arise, with some queries taking significantly longer to execute in PostgreSQL compared to MSSQL.
Join algorithm selection and parallelism are two key advantages contributing to MSSQL's impressive query execution speed.
Multi-clause selectivity estimation in MSSQL allows for more precise cardinality estimation in complex join queries, giving it an edge over PostgreSQL in certain scenarios.

PostgreSQL Asymmetric Join technique as a Further Evolution of Partitionwise Join

Conserving CPU's cycles ... • 0 implied HN points • 05 May 24

🕹 Technology Databases

The Asymmetric Join (AJ) technique in PostgreSQL allows for more efficient parallel append operations by individually connecting each partition with a non-partitioned relation and merging results.
One advantage of the Asymmetric Join technique is the independent choice of join strategy for each partition, leading to improved table scan filtering and reduced hash table sizes.
Considerations for implementing the Asymmetric Join include growing search space for plans, restrictions on the inner and outer relations, and the necessity of checking partitioning schemes for different plain and partitioned relation combinations.

Implementing Attribute-Level Encryption with DynamoDB

realkinetic • 0 implied HN points • 01 May 24

🕹 Technology Databases

When working with sensitive data, having a strong security story and implementing attribute-level encryption is crucial.
For extremely sensitive data, transparent encryption may not be sufficient, and application-level encryption adds an extra layer of security.
Implementing attribute-level encryption for Amazon DynamoDB with KMS in Python can be achieved through a pattern using Lambda as the runtime, with the architecture built and managed using AWS CDK.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Vector Database Checklist Point

The Beep • 0 implied HN points • 01 Mar 24

🕹 Technology Databases

Always start with a clear goal when building a VectorDB. This helps in setting the right direction and making evaluation easier.
Data quality is crucial for VectorDB to work well. Clean and well-prepared data leads to better search results.
Choosing the right VectorDB is important. Picking the wrong one can lead to issues with how effectively it retrieves information.

Building Question Similarity Search using Vector DB

The Beep • 0 implied HN points • 11 Feb 24

🕹 Technology Databases

Creating a question similarity system can help avoid duplicate posts on forums like Stack Overflow. This makes it easier for users to find existing answers and helps contributors manage their workload better.
The system uses Vector databases and text embeddings to show related questions as users type their title. This means users get instant suggestions, which improves their experience when asking for help.
To build this system, you need to follow a few steps including getting data, creating a database, transforming questions into embeddings, and finding similar questions. It's a straightforward process if you break it down.

On Databases and Memory

Thoughts from the trenches in FAANG + Indie • 0 implied HN points • 06 Jan 24

🕹 Technology Databases

Migrating from one database system to another, like from PostgreSQL to MongoDB, might not solve performance issues and could be costly and slow. It's often better to analyze if the migration will really help before proceeding.
Understanding how databases work is crucial. Different databases use memory and disk in similar ways, so just switching systems might not lead to significant improvements.
There are effective ways to boost database performance without major migrations. Improving cache, using faster disks, and optimizing indexing strategies can help both PostgreSQL and MongoDB perform better.

Internal Storage Design of Modern Key-value Database Engines [Part 1]

Practical Data Engineering Substack • 0 implied HN points • 05 Aug 23

🕹 Technology Databases

Key-value stores use a simple model where each piece of data has a unique key and its associated value. This makes them great for fast lookups, especially when you only need to search by key.
The log-structured data design helps improve writing speed by storing data in order and delaying updates until they're batched together. This means the system can handle many writes quickly.
Many modern key-value stores are inspired by early successes like Amazon's DynamoDB and Google's BigTable. These systems have shaped how newer ones are built to be efficient and scalable.

Range Partitioning: Zero to One

aspiring.dev • 0 implied HN points • 17 Mar 24

🕹 Technology Databases

Range partitioning splits data into key ranges to improve performance and scalability. This method helps databases manage heavy loads by distributing data efficiently.
Unlike hash partitioning, range partitioning allows for easier scaling. You can adjust the number of ranges as needed without the hassle of rewriting data.
While range partitioning is powerful, it can be tricky to implement and may struggle with very sequential workloads. Planning is necessary to avoid creating performance hotspots.

HN blogs - 15/10/24

HackerNews blogs newsletter • 0 implied HN points • 15 Oct 24

🕹 Technology Databases

Trust takes time to build and can be easily lost. It’s important to focus on long-term relationships.
Switching password managers can be tricky, so it's better to take your time during the process.
The CAP theorem helps understand how to balance consistency, availability, and partition tolerance in distributed databases.

Dataflow 101: Exploring Essential Modes for Efficient Applications

DataSketch’s Substack • 0 implied HN points • 13 Feb 24

🕹 Technology Databases

Databases are key for storing and managing data, supporting both everyday transactions and complex analysis. Using them effectively helps data engineers connect different platforms and applications.
Different data transfer methods, like REST and RPC, help systems communicate efficiently, just like a well-organized library or a quick phone call. Choosing the right method depends on the speed and precision needed for the task.
Message-passing systems allow for flexible and real-time data processing, making them great for applications like IoT or e-commerce. They help ensure communications between services happen smoothly and reliably.

Announcing the Sort API for Automating Postgres and Snowflake Workflows

Database Engineering by Sort • 0 implied HN points • 07 Nov 24

🕹 Technology Databases

The Sort API helps automate and manage workflows in Postgres and Snowflake, making it easier for teams to work with their databases.
With Change Requests, users can track, review, and execute changes to their data, which enhances collaboration and transparency.
The API offers powerful querying capabilities, allowing users to define and run their own queries for better data retrieval in their workflows.

Embedding-Based Tool Selection for AI Agents

Bit Byte Bit • 0 implied HN points • 21 Dec 25

🕹 Technology Databases

Embed tool descriptions and use semantic search to pick the top few relevant tools per query so you dramatically cut token usage and improve the model's tool‑selection accuracy.
Choose an embedding provider based on your needs — calling OpenAI is simple and cheap for small volumes, while running a local model gives privacy and low latency but adds operational overhead — and hide that choice behind a provider abstraction so you can swap easily.
Pure similarity can miss multi‑step dependencies, so expand selections by category and tune your similarity threshold, have a cold‑start fallback, and you'll get big wins in cost and latency.

State Must Live With the Agent

domsteil • 0 implied HN points • 12 Jan 26

🕹 Technology Databases

Commerce built around remote services breaks when autonomous agents execute and retry at scale, so state must live where decisions are made to avoid duplication, corruption, and ambiguous outcomes.
Safe autonomous commerce requires embedding execution and local persistence inside agents, with deterministic state transitions, idempotent commands, and event-sourced histories so actions are replayable and resilient offline.
This is a fundamental architectural shift: commerce should behave like a local database (iCommerce) with network sync and settlement as secondary roles, not an optional optimization, to enable reliable agent-driven economies.

Building Health Apps with Wearable Devices

The Healthtech Initiative • 0 implied HN points • 28 Jan 26

🕹 Technology Databases

You can build a personal health vault web app without heavy coding by using Cursor's agent mode to scaffold the UI and logic while Terra API handles wearable integrations. Supabase stores the synced wearable data and medical files so the app can show charts and documents.
The implementation steps are straightforward: get your Terra API key and Dev ID, add environment variables, create endpoints like /api/terra/connect and /api/terra/connections, and configure Supabase as a destination. Then add Terra's MCP (AI interface) so the app can run LLM-powered queries against the health data.
Combining multi-year wearable data with medical documents and an LLM prompt engine lets you build timelines, strain/readiness scores, and warm-styled graphs to compare biomarkers like HRV, RHR, and VO2 Max around surgical or recovery events. This setup makes it easy to visualize recovery phases and surface correlations between wearable signals and medical records.

Migrating a data warehouse

Expand Mapping with Mike Morrow • 0 implied HN points • 27 Feb 26

🕹 Technology Databases

A warehouse migration is a multi-step project where tasks range from easy to very hard. Some small changes like updating BI connections are quick, but others need significant effort.
Medium-effort work like schema mapping, one-time backfills, and reconfiguring pipelines is necessary and requires careful data validation. These steps are manageable but time-consuming.
The hardest parts are deciding what data to keep, rewriting transformations, running both warehouses in parallel, and recreating access controls. Those areas carry the most risk and will dominate the timeline.