The hottest Systems Design Substack posts right now

And their main takeaways

Elegant Solutions

David Friedman’s Substack • 404 implied HN points • 22 Dec 24

Using both words and numbers when writing a check helps reduce mistakes, making it much harder to misread the amount. It's a clever way to prevent errors and fraud.
The design of everyday items, like rubber spatulas and manhole covers, often has simple solutions to practical problems. These designs make them more useful in various situations.
When faced with a decision or a problem, looking for the simplest and most practical solution is key. Sometimes, the best way to find a solution is to observe how things are naturally done.

All you need to know about the Google File System

VuTrinh. • 119 implied HN points • 11 May 24

🕹 Technology Data Systems Distributed Computing Systems Design Fault Tolerance

Google File System (GFS) is designed to handle huge files and many users at once. Instead of overwriting data, it mainly focuses on adding new information to files.
The system uses a single master server to manage file information, making it easier to keep track of where everything is stored. Clients communicate directly with chunk servers for faster data access.
GFS prioritizes reliability by storing multiple copies of data on different chunk servers. It constantly checks for errors and can quickly restore lost or corrupted data from healthy replicas.

Accelerate by years part II - Self-contained inlang files

Opral (lix & inlang) • 19 implied HN points • 23 Jul 24

🕹 Technology Software Development Systems Design Collaboration Tools Product Development

Making inlang files self-contained can speed up development. Zipping these files means they won't rely on outside git repositories.
With this change, new features can be built much faster. This includes things like collaboration tools and app features that don't depend on git.
Removing the git dependency opens up growth opportunities. It allows designers and translators to get involved and helps the overall ecosystem grow.

How We Built a Self-Healing System to Survive a Terrifying Concurrency Bug At Netflix

Push to Prod • 5 HN points • 27 Aug 24

🕹 Technology Software Engineering Systems Design Incident Management

At Netflix, there was a serious concurrency bug causing CPU problems, and they needed a quick solution. They couldn't fix it right away and had to come up with a way to keep their systems running through the weekend.
Instead of manually fixing everything, they created a self-healing system. They randomly killed a few server instances every 15 minutes, replacing them with fresh ones, which allowed the team to relax during the crisis.
This situation taught them that sometimes unconventional solutions are necessary. Prioritizing the team's well-being can be just as important as fixing technical issues.

So... what are RPCs [System Design Sundays]

Technology Made Simple • 219 implied HN points • 25 Sep 23

🕹 Technology Systems Design Communication Distributed Computing APIs Networking

Remote Procedure Calls (RPCs) allow for program procedures to execute in a different address space without the programmer having to explicitly write details for the remote interaction.
RPCs are prevalent in modern systems design due to their efficiency, scalability, and flexibility in enabling communication between various services.
RPCs are a powerful tool for building distributed computing systems, offering advantages such as efficiency, scalability, and flexibility in communication between services.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Elusive Built-in not Bolted-on

Resilient Cyber • 239 implied HN points • 17 Apr 23

🕹 Technology Cybersecurity Software Development Risk management Public Policy Systems Design

Cybersecurity should be included from the start of product design, not added later. This means making security a priority throughout the whole development process.
Products should come secure by default, so users don't have to figure out how to protect themselves. Just like cars come with seatbelts, software needs built-in security features.
There needs to be accountability for software security. Companies should not shift the blame to users but should instead be responsible for ensuring their products are secure and safe to use.

God of War Ragnarok: Five Game Systems In One Bag

The Bottom Feeder • 290 implied HN points • 22 Feb 23

🕹 Technology Gaming Systems Design Indie games

The game design for God of War Ragnarok involved combining multiple game systems, resulting in a complex and overwhelming experience for players.
Despite the extensive features and upgrades in the game, many of these elements were found to be unnecessary and not essential for effective gameplay.
Feedback on game design suggests that prioritizing clear, substantive upgrades and reducing the number of systems could lead to a more enjoyable and balanced gaming experience.

How Amazon makes Machine Learning Trustworthy[System Design Sundays]

Technology Made Simple • 79 implied HN points • 20 Mar 23

🕹 Technology Machine Learning AI Systems Design

Privacy-preserving Machine Learning keeps the input secret by altering certain details.
Federated Learning allows Amazon to update models without sending data to their centers, saving costs and maintaining privacy.
Amazon ensures fairness in ML by balancing biases in datasets through oversampling and data substitutions.

A better way to approach Systems Design Interviews [Storytime Saturdays]

Technology Made Simple • 59 implied HN points • 25 Jun 23

🕹 Technology Systems Design

Approaching Systems Design Interviews requires a systematic strategy to not feel overwhelmed. Leetcode interviews test core ideas, while Systems Design interviews have a larger, more ambiguous scope.
When preparing for Systems Design Interviews, focus on balancing depth and breadth. Avoid getting lost in esoteric details and ensure coverage of essential aspects of complex questions.
Use a framework that views the system as a product to identify core components and showcase expertise effectively during Systems Design Interviews.

How Shopify Balances Database Shards Without Downtime

Arpit’s Newsletter • 58 implied HN points • 01 Mar 23

🕹 Technology E-commerce Database Management Systems Design

Shopify uses a distributed architecture with pods to handle a large number of shops sharing the same database.
Shopify balances database shards without downtime by moving shops between pods using a tool called ghostferry.
To ensure no downtime or data loss, Shopify follows three phases when moving a shop from one pod to another: batch copy, prepare for cutover, and cutover and updating the routing.

Performing Multiple LLM Calls & Voting On The Best Result Are Subject To Scaling Laws

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 19 Mar 24

🕹 Technology AI Machine Learning Data science Systems Design Performance optimization

Making more calls to Large Language Models (LLMs) can help with simple questions but may actually make it harder to answer tough ones.
Finding the right number of calls to use is crucial for getting the best results from LLMs in different tasks.
It's important to design AI systems carefully, as just increasing the number of calls doesn't always mean better performance.

Google's 3 Pillars for Building Resilient and Scalable Systems [Systems Design Sundays]

Technology Made Simple • 79 implied HN points • 12 Dec 22

🕹 Technology Systems Design Automation Scalability Resilience

Scalability is crucial for systems to handle increased loads like more users without losing performance.
Resilient systems can handle various challenges like constant user actions and security threats.
Automation and loose coupling are key pillars for enhancing the scalability and resilience of systems.

Strategies for Replication in Distributed Databases [System Design Sundays]

Technology Made Simple • 59 implied HN points • 16 Jan 23

🕹 Technology Data Databases AI Machine Learning Systems Design

Replication in distributed databases involves keeping copies of data on multiple machines spread across a network.
Benefits of replication in distributed systems include improved accessibility to data and fault tolerance.
Handling changes to replicated data involves choosing between active and passive replication methods, each with its own trade-offs.

How Netflix survived the AWS outage in 2011 [System Design Sundays]

Technology Made Simple • 39 implied HN points • 13 Feb 23

🕹 Technology Systems Design Resilience Cloud Computing Chaos Engineering Data architecture

Netflix utilized Open Connect Appliances to provide better streaming by localizing content on devices of certain ISPs.
The use of Stateless-service architecture allows any server to step in if one fails, ensuring uninterrupted service.
Netflix's redundancy strategy includes storing data in multiple zones, using 'n+1' redundancy, and employing graceful degradation techniques to maintain limited functionality in case of failure.

Fundamental architectural patterns Part 1[Systems Design Sundays]

Technology Made Simple • 59 implied HN points • 17 Jul 22

🕹 Technology Systems Design

Fundamental architectural patterns can help in quickly solving common problems and creating a solid base for project implementation.
Key patterns covered include Layers Pattern, Client-Server Pattern, and Pipe and Filter Pattern, each with specific roles and benefits.
Patterns like Layers focus on separation of concerns, Client-Server centralizes resources for multiple clients, and Pipe and Filter facilitates data processing through a series of components.

Bloom Filter [Systems Design Sundays]

Technology Made Simple • 59 implied HN points • 10 Jul 22

🕹 Technology Data Structures Algorithms Systems Design Interview Preparation

Bloom Filters are probabilistic data structures used to efficiently test for membership.
Bloom Filters work by having a bit array of size m with k hash functions mapping values to indices, setting the indices to 1 for a given input.
Bloom Filters are great for reducing unnecessary disk access, but they can result in false positives and need regeneration as more values are added.

Serverless Computing 101[Systems Design Sundays]

Technology Made Simple • 39 implied HN points • 07 Aug 22

🕹 Technology Cloud Computing Serverless Computing Systems Design

Serverless Computing allows developers to build and run code without managing servers, saving costs and increasing flexibility.
In serverless computing, developers pay for the exact amount of server space they need, eliminating expenses for idle infrastructure.
Large server providers offer servers as a service, benefiting small organizations while ensuring scalability and cost-effectiveness.

How IFood Microservice handles 30k Requests per Second[Systems Design Sundays]

Technology Made Simple • 39 implied HN points • 23 May 22

🕹 Technology Systems Design Microservices Scaling Collaboration

Develop solutions with future scaling needs in mind to make things easier down the line.
Spend significant time planning how to divide responsibilities within your team to ensure efficiency and effectiveness.
Clearly define your needs and anticipate potential problems to save time and effort in system design.

All about Redis [Systems Design Sunday]

Technology Made Simple • 39 implied HN points • 02 May 22

🕹 Technology Systems Design Job Opportunities Social media

Redis is commonly used in Systems Design and has many functionalities, making it suitable for various user needs.
Redis 7.0 has been released, signaling the importance of understanding Redis in System Design.
By expanding your Redis knowledge, you could increase your job opportunities as recruiters actively seek professionals with such expertise.

Quoras Database Sharding [System Design Sundays]

Technology Made Simple • 39 implied HN points • 25 Apr 22

🕹 Technology Databases Systems Design Social media

Database sharding is crucial for large-scale systems, allowing databases to be split across multiple computers for quicker searches by filtering out unnecessary tables.
Sharding based on important characteristics, like user platforms, can improve data analysis and streamline data management for platforms like social media sites.
Utilizing database sharding heavily can lead to more efficient operations and a better user experience, commonly seen in large-scale social media platforms.

The effects of splitting a project into multiple teams[Systems Design Sundays]

Technology Made Simple • 39 implied HN points • 18 Apr 22

🕹 Technology Software Architecture Systems Design

As projects grow, you may need multiple teams to handle different components, changing how you work from being in one team to collaborating across teams.
Conway's Law emphasizes that a system's design structure mirrors the organization's communication structure, highlighting the importance of how teams interact when developing a project.
Learning about the risks in current software architecture design approaches can help in adapting and improving your skills for dealing with larger project scopes.

Systems Design: How to Design Twitter

Technology Made Simple • 39 implied HN points • 21 Mar 22

🕹 Technology Systems Design Twitter

The post talks about designing Twitter and the importance of learning from good examples in any field.
The author acknowledges a scheduling mistake with the post and provides a link for those who missed it.
The text also mentions being more careful in the future to avoid similar mistakes.

Problem 47:File Syncing Code [Google]

Technology Made Simple • 19 implied HN points • 13 Jul 22

🕹 Technology Networks Systems Design

Coding interviews may have unexpected questions, like system design scenarios, which are valuable to practice.
Implementing a file syncing algorithm for low-bandwidth networks, especially when the files are mostly the same, is an interesting problem.
Sharing content and requesting feedback can help reach a wider audience and improve the quality of the publication.

An introduction to CDNs[Systems Design Sundays]

Technology Made Simple • 19 implied HN points • 20 Jun 22

🕹 Technology Systems Design

CDNs use a distributed system of servers to improve user experience by connecting them to the closest server.
CDNs offer benefits like speed, cost-effectiveness, scalability, uptime, and improved security for applications.
Drawbacks of CDNs include potential high costs, third-party data usage concerns, and dependence on the quality of server placement.

Adaptive Defense: Redefining Security for a Dynamic World

Phoenix Substack • 0 implied HN points • 20 Feb 25

🕹 Technology Cybersecurity Innovation Risk management Tech Policy Systems Design

Static security is outdated. We need systems that can adapt quickly to changing threats.
Trust in security should be flexible. Instead of seeing things as secure or vulnerable, we should continuously assess and improve our defenses.
Effective security must understand each situation. It's about using real-time information to respond appropriately, not applying the same rules everywhere.

Prefer Composition over Inheritance

Better Engineers • 0 implied HN points • 23 Mar 23

🕹 Technology Software Development Programming Languages Systems Design Code Optimization

Composition is often better than inheritance because it allows you to create new classes by combining existing ones. This helps avoid complex class hierarchies.
Using interfaces can help you achieve different behaviors without relying on a single inheritance path. This keeps your code flexible and clear.
Delegation lets you pass tasks to other objects, which helps separate functionality and maintain cleaner, more understandable code.

Composition is a function of communication

Tech and Thoughts • 0 implied HN points • 24 Oct 23

🕹 Technology Software Development Communication Microservices Programming Languages Systems Design

Communication is key for building software. Systems work best when they have clear and simple ways for different parts to talk to each other.
Just like on the internet, software should focus on how parts interact, not just what those parts do. This makes it easier to adapt and grow.
When designing software, spend time planning how components will communicate. Get this right early on to avoid problems later.