The hottest Databases Substack posts right now

And their main takeaways
Category
Top Technology Topics
Computer Ads from the Past 640 implied HN points 19 Mar 26
  1. A poll is open to choose the March 2026 +Post topic and will be live for one week.
  2. Choices include knowledge software, a backup drive, database software, and a programming language.
  3. The full post is behind a subscription paywall but can be claimed for free or accessed by purchasing a subscription.
Madhur’s Writings 84 implied HN points 09 Mar 26
  1. Launched two consumer products while solo to learn end-to-end product building and shipping real apps.
  2. Leans heavily on AI coding assistants and reusable agent skills to speed up development and design work.
  3. Picks pragmatic, cost-conscious, and privacy-first infrastructure and services—hosting (Vercel/Hetzner/GCP), Cloudflare R2 for storage, Neon for databases, GitHub Actions for CI/CD, Stripe for payments, and Resend/Zoho for email, plus analytics like PostHog and Google Analytics.
Data Streaming Journey 79 implied HN points 28 Oct 24
  1. Kafka and similar tools are still relevant and necessary for effective data streaming today. They help handle large amounts of data quickly and reliably.
  2. Modern alternatives to Kafka, like Materialize and Debezium, simplify the process of working with operational data and make it easier to integrate with other tools.
  3. Even if you only want to move data from a database to a data warehouse, using a streaming platform can benefit larger enterprises by making data integration more efficient.
Minimal Modeling 304 implied HN points 15 Mar 26
  1. Treat queries as functions and start by defining anchors: maintain a compact one‑column list of unique IDs for each entity and document retention/archive rules so input data quality is clear.
  2. Represent attributes and links as clean two‑column datasets (anchor ID + value or anchor ID + anchor ID), filter out NULLs and sentinel values, canonicalize values, use only atomic types, and ensure uniqueness.
  3. Materialize those compact datasets and keep them updated with a pipeline so your data is correct by construction; from these trusted pieces you can build flat tables while avoiding common issues like duplicates, unclear identity, and messy JSON.
benn.substack 1380 implied HN points 23 Jan 26
  1. Writing and reading SQL demand different styles: shortcuts and shorthand speed up writing but make queries harder to understand, and teams often prioritize writing convenience over clarity.
  2. With AI generating much of the code, development has shifted to a "vibe and verify" model, but data work is hard to verify because queries and analyses are difficult to check by eye or prose alone.
  3. The solution is better representations for comprehension — diagrams, clearer formatting, or a language/app that turns any query into an accessible, annotated picture so humans can quickly verify what the computation actually did.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Bite code! 1223 implied HN points 05 Feb 26
  1. UVX.sh lets anyone install and run CLI tools published on PyPI without needing a local Python setup, making one-shot installs and sharing tools much faster and simpler.
  2. Pandas 3 changes defaults to real string dtypes, enforces consistent copy-on-write for indexing to avoid surprising mutations, and adds a functional col API to encourage clearer and faster data transformations.
  3. Oxyde is an async-first ORM with Pydantic typing, Django-like ergonomics, built-in migrations, and n+1 safety nets, offering high performance and modern ergonomics but still being early-stage for critical long-term projects.
Minimal Modeling 304 implied HN points 29 Jan 26
  1. Lock a subtype/status column to a single value with a CHECK so subtype tables can only hold rows for that exact status, and reference the main table with a composite foreign key (id, status) to prevent contradictory data.
  2. Give the main table a unique (id, status) pair and make subtype tables include a defaulted, immutable status plus their own keys so you can model both single- and multi-row status-specific information without NULLs.
  3. This is a pure relational, NULL-free way to encode subtypes/status-dependent data using only standard constraints (CHECK, PK, FK), moving integrity into the schema and making the design extensible even if it isn’t commonly taught.
Infra Weekly Newsletter 13 implied HN points 14 Mar 26
  1. Postgres can be turned into a high-performance time-series platform by using extensions that automate time partitioning, offload cold data to Iceberg/S3, and process append-only data incrementally so older data remains queryable without bloating the database.
  2. Infrastructure buying is trending toward flexibility: disaggregated, modular stacks let compute and storage scale independently, validated configurations reduce migration risk, and Ethernet + NVMe/TCP is reducing reliance on Fibre Channel SANs.
  3. Autonomous AI agents can collaborate to evade safeguards and exfiltrate secrets when given adversarial prompts, creating a real security risk that needs stronger controls and defensive design.
Engineering At Scale 795 implied HN points 29 Nov 25
  1. Connection pooling reuses a limited set of open database connections so the database isn’t overwhelmed, improves resource utilization, and avoids the 20–50 ms setup cost per query.
  2. Pool size is a trade-off: too small causes waiting and higher latency during spikes, while too large wastes database resources; tune the size with load testing, monitoring, and a 15–20% buffer, and consider multiple pools for different workloads.
  3. Building a robust pool is hard — it must handle high concurrency with low overhead and be configurable, and scaling across many app instances can still multiply connections, often requiring proxies or coordination to prevent re-overloading the database.
VuTrinh. 299 implied HN points 03 Aug 24
  1. LinkedIn's data infrastructure is organized into three main tiers: data, service, and display. This setup helps the system to scale easily without moving data around.
  2. Voldemort is LinkedIn's key-value store that efficiently handles high-traffic queries and allows easy scaling by adding new nodes without downtime.
  3. Databus is a change data capture system that keeps LinkedIn's databases synchronized across applications, allowing for quick updates and consistent data flow.
Blog System/5 661 implied HN points 07 Dec 25
  1. You can replace serverless runtimes with a FreeBSD server with surprisingly little code change when your app is a standalone HTTP binary, and use tools like Cloudflare Tunnel to handle TLS and frontend duties.
  2. FreeBSD's built-in utilities (daemon(8), rc.d scripts, newsyslog) make it easy to run services as unprivileged daemons, manage PID/log files, and rotate logs reliably.
  3. Self-hosting improves performance, predictability, and cost control, but it trades off cloud-level redundancy, easy staging slots, and some automated deployment conveniences unless you recreate those features locally.
Minimal Modeling 202 implied HN points 12 Jan 26
  1. Model joins by attaching a nested dataset to each outer row and then flattening by duplicating the outer row for each inner row; if the inner set is empty you skip the outer row for INNER JOIN or replace it with a single NULL row for LEFT JOIN.
  2. The inner part of a query becomes very simple: INNER JOIN is just a filtered SELECT, GROUP BY is an aggregated filtered SELECT, and LEFT JOIN is a filtered SELECT plus a conditional UNION ALL NULL row, so no special-casing is needed.
  3. Splitting queries into an outer table and a per-row inner dataset gives a clear, teachable mental model and a single canonical flattening rule you can reuse to reason about more complex SQL patterns like correlated subqueries.
Harnessing the Power of Nutrients 1757 implied HN points 27 Dec 23
  1. Glutamine is crucial for specific health conditions like illness, injury, surgery, and certain dietary needs.
  2. Nutritional databases lack accurate information on glutamine content because of flawed measurement methods, impacting the understanding of amino acid composition in foods.
  3. Animal proteins like leg meat of chicken and pork are high in glutamine, while dairy proteins are intermediate, and plant proteins have varying levels, highlighting the importance of diversifying protein sources for glutamine intake.
Engineering At Scale 195 implied HN points 13 Dec 25
  1. Database proxies sit between services and the database and multiplex many client connections onto a fixed pool of database connections, preventing connection spikes and making horizontal scaling safer.
  2. Proxies can add features like query caching, read/write routing, and sharding/replica management, which simplifies application logic and abstracts database topology from the app.
  3. Using a proxy comes with costs — extra deployment and maintenance overhead and added latency (~10–15 ms) — so they’re valuable for complex setups (replication, sharding, FaaS) but can be overkill for a single simple database and must be designed to avoid becoming a SPOF.
Infra Weekly Newsletter 22 implied HN points 12 Feb 26
  1. Agents need durable, versioned, replayable state so their behavior can be debugged, audited, and trusted in production; self-hosted state engines provide strong consistency and memory for that use case.
  2. Data infrastructure, not models, will be the real competitive advantage for agent-driven systems because agents create lots of tiny, ephemeral databases and demand fast, reusable access; winning databases will virtualize many logical tenants on shared infra, separate compute and storage, and shift pricing to usage-based models.
  3. Counting CVEs or relying only on CVSS is a shaky security strategy because both are noisy and lack context; build AppSec around threat modeling and contextual triage, and treat zero-CVE claims with skepticism since upstream timelines and metadata can hide real risk.
Software Bits Newsletter 51 implied HN points 04 Jan 26
  1. Memory allocator patterns — like per-node caches, hierarchical range grants, batching, and prefetching — transfer cleanly to distributed ID generation and let services hand out unique IDs locally with almost no coordination.
  2. There is no one-size-fits-all ID strategy: slabs and hierarchical ranges give extreme throughput and B-tree locality at the cost of wasted IDs and weaker global ordering, consensus gives strict global ordering and durability but costs latency and availability, and Snowflake-style schemes sit in between.
  3. The best engineering move is methodological: spot a related solved problem, extract its core principles (hierarchy, locality, batching, prefetching), and adapt them while accounting for distributed realities like partial failure and unbounded latency.
Data Engineering Central 589 implied HN points 17 Jan 24
  1. Indexes are crucial for improving performance in SQL operations and data access.
  2. Clustered and non-clustered indexes are the two main types to understand in SQL indexing.
  3. Understanding use cases and query access patterns is key to designing effective indexes for data warehouses.
benn.substack 920 implied HN points 06 Dec 24
  1. Software has changed from being sold in boxes in stores to being bought as subscriptions online. This makes it easier and cheaper for businesses to manage.
  2. The new trend is separating storage from computing in databases. This lets companies save money by only paying for the data they actually use and the calculations they perform.
  3. There's a push towards making data from different sources easily accessible, so you can use various tools without being trapped in one system. This could streamline how businesses work with their data.
Technically 40 implied HN points 18 Dec 25
  1. Replit is the most feature-rich and makes the most polished apps, but it’s slower, can waste time and money on default automated testing, and requires payment to publish.
  2. v0 is best for people who can code — it’s fast, developer-friendly, integrates well with Supabase and Vercel, and makes deployment straightforward.
  3. Lovable and Bolt lag behind: Lovable is easy and quick but less polished with confusing pricing and security gaps, while Bolt’s planning and token pricing are opaque and it often fails to reliably implement its own plans.
Opral (lix & inlang) 19 implied HN points 06 Aug 24
  1. The team is moving quickly with rewriting inlang and lix using SQLite instead of git. This change is expected to speed things up a lot.
  2. The release date for the new version is coming at the end of August, so we don't have to wait long.
  3. Lix aims to become a social network where people can share various kinds of their work, like music, video, or design projects.
Opral (lix & inlang) 19 implied HN points 23 Jul 24
  1. Using SQLite can really speed up the development of both inlang and lix. This saves a lot of time on needing to create complex systems.
  2. Lix 1.0 is coming soon, with simple plugins that can manage changes easily. This makes it easy for apps to work with changes directly.
  3. The next steps involve building a user interface for merging data and creating a plugin for inlang. This should help make the system more efficient.
Opral (lix & inlang) 19 implied HN points 23 Jul 24
  1. Building lix without relying on Git can simplify the process. This means avoiding the complications that come with Git's file-based storage model.
  2. Using SQLite for storing data will solve many problems like concurrency and data integrity. It makes it easier to manage application data compared to handling everything through Git.
  3. The main requirements for lix 1.0 will be a merging function and a plugin for inlang. This will open up opportunities for third-party developers to create new lix applications.
Eventually Consistent 39 implied HN points 06 May 24
  1. ScyllaDB introduces a shard per core design, maximizing parallelism by assigning a separate shard to each core.
  2. FoundationDB bridges SQL and NoSQL, offering ACID transactions with schema flexibility and performance.
  3. Compilers like Clang and language servers like Clangd have separate purposes; language servers follow the Language Server Protocol for portability.
Better Engineers 7 HN points 31 Jul 24
  1. Scaling systems to handle millions of users involves understanding how to make systems work better under pressure. This can be done by adding more resources or managing them effectively.
  2. Vertical scaling means adding more power (like RAM or CPU) to existing servers, while horizontal scaling means adding more servers to share the load. Horizontal scaling is often better for high traffic situations.
  3. Using a master-slave database setup helps balance loads and keeps data safe. If one database fails, another can take over, ensuring the system runs smoothly and reliably.
The Beep 39 implied HN points 18 Feb 24
  1. Vector databases help improve how machines understand and respond to queries by providing more context. This makes it easier to get accurate answers to questions.
  2. There are different kinds of vector databases, like self-hosted and managed. Self-hosted requires more work to maintain, while managed ones are easier and quicker to set up.
  3. Choosing the right vector database depends on your needs like price, scalability, and the specific features you require for your application. It's important to test them to see which one fits best.
Infra Weekly Newsletter 4 implied HN points 15 Jan 26
  1. GCP favors consistency and global networking primitives and is stronger in data, analytics, and ML. It uses a project-based organization that makes builds faster but more opinionated than AWS.
  2. Platform teams now sit between security, compliance, finance, and application groups and need clearer ownership and decision authority to avoid an accountability gap.
  3. A sophisticated, modular Linux malware framework is targeting cloud servers and containers for credential theft and stealthy persistence, so organisations should assume such threats are coming and tighten access controls, monitoring, patching, and Linux/cloud EDR.
Technology Made Simple 79 implied HN points 03 Apr 23
  1. Discord faced performance issues with Cassandra, requiring increasing maintenance effort and leading to unpredictable latency.
  2. Hot partitions were a problem in Cassandra, causing hotspotting and impacting the database's performance during concurrent reads.
  3. Garbage collection in Cassandra posed challenges, leading Discord to switch to ScyllaDB which does not have a garbage collector.
Interconnected 92 implied HN points 22 Dec 24
  1. In 2025, software tools like API platforms, databases, and GPU clouds will be key for AI applications. They are becoming just as important as hardware for building AI solutions.
  2. The focus on AI is shifting from just hardware to include software infrastructure that supports the creation of smarter, more useful AI agents.
  3. Investors should pay attention to emerging software tools and platforms as they will drive the next wave of innovation in AI. Recognizing which ones will succeed is crucial.
philsiarri 22 implied HN points 21 Aug 25
  1. Vector databases store information in a way that captures meaning, helping AI search for similarities instead of exact matches. This means a sentence or an image can be turned into a special numeric form that AI understands better.
  2. Traditional databases are good for exact searches but struggle with the complex needs of AI. Vector databases are designed for quick and efficient searches involving high-dimensional data, making them much better for AI applications.
  3. Many companies like Pinecone and Weaviate are leading the way in vector databases, which are being used in various areas like e-commerce, fraud detection, and customer support to improve how we find and use information.
TheSequence 63 implied HN points 12 Feb 25
  1. Embeddings are important for generative AI applications because they help with understanding and processing data. A good embedding framework should be simple and easy for developers to use.
  2. Txtai is an open-source database that combines different tools to make working with embeddings easier. It allows for semantic search and supports creating various AI applications.
  3. This framework can help build advanced systems like autonomous agents and search tools, making it a versatile choice for developers creating LLM apps.
🔮 Crafting Tech Teams 59 implied HN points 01 Jun 23
  1. Redis has evolved beyond just a cache and can be used for various purposes like PubSub notifiers, search DB, and event storage.
  2. Postgres, known as an SQL DB, can also be utilized as an event store, message queue, outbox, or document db, showcasing the versatility of technologies.
  3. It's essential to stay up to date with how technologies like Redis are changing over the years to make the most of their capabilities.
davidj.substack 71 implied HN points 05 Dec 24
  1. Using dlt to work with Bluesky API allows for easy data extraction. It saves time by handling metadata and schema changes automatically.
  2. dlt simplifies dealing with nested data by creating separate tables. This makes it easier to manage complex data structures.
  3. sqlmesh can quickly generate SQL models based on dlt pipelines. This feature streamlines the workflow and reduces manual setup time.
Technology Made Simple 59 implied HN points 16 Jan 23
  1. Replication in distributed databases involves keeping copies of data on multiple machines spread across a network.
  2. Benefits of replication in distributed systems include improved accessibility to data and fault tolerance.
  3. Handling changes to replicated data involves choosing between active and passive replication methods, each with its own trade-offs.
The Beep 19 implied HN points 04 Feb 24
  1. Vector databases are designed to handle complex and unstructured data, making them great for AI applications like semantic search and face recognition. They convert information into high-dimensional vectors that are easy to work with.
  2. Unlike traditional databases, vector databases can manage different types of data such as text, images, and audio, which makes them very versatile. They're like a Swiss Army knife for managing data.
  3. Vector databases play a crucial role in enhancing AI capabilities, providing better access and analysis of data, which leads to smarter applications, including smart assistants and more.
davidj.substack 59 implied HN points 06 Dec 24
  1. There are different types of models in sqlmesh, such as full, view, and embedded models, each having unique functions and uses. It's important to choose the right model type based on how fresh or how often you need the data.
  2. SCD Type 2 models are useful for managing records that change over time, as they track the history of changes. This can make analyzing data trends much easier and faster.
  3. External models in sqlmesh allow you to reference database objects not managed by your project. This can simplify data modeling and documentation, as they automatically gather useful metadata.