The hottest System Design Substack posts right now

And their main takeaways
Category
Top Technology Topics
Engineering At Scale 255 implied HN points 20 Jan 25
  1. Instagram's video upload system needs to handle millions of uploads daily while keeping the process fast and efficient. It converts videos into different formats for users with varying internet speeds.
  2. The system can be designed in approaches, starting from simple methods to more complex asynchronous solutions. Improving reliability and speed is key to making the service work better.
  3. Using segmented video uploads allows faster processing. By uploading smaller parts of the video, the service can work on them at the same time, reducing wait times for users.
VuTrinh. 279 implied HN points 17 Aug 24
  1. Facebook's real-time data processing system needs to handle huge amounts of data quickly, with only a few seconds of wait time. This helps in keeping things running smoothly for users.
  2. Their system uses a message bus called Scribe to connect different parts, making it easier to manage data flow and recover from errors. This setup improves how they deal with issues when they arise.
  3. Different tools like Puma and Stylus allow developers to build applications in different ways, depending on their needs. This means teams can quickly create and improve their applications over time.
System Design Classroom 499 implied HN points 19 Jul 24
  1. Loose coupling is important in software. It means different parts of a program should depend on each other as little as possible, making it easier to change and fix things.
  2. The Law of Demeter suggests that objects should only talk to their direct friends and not reach out too far. This helps to keep dependencies low and makes code more manageable.
  3. Using strategies like the Single Responsibility Principle, interfaces, and dependency injection can improve your code's structure. This makes modules clear, easy to test, and maintain.
System Design Classroom 679 implied HN points 02 Jul 24
  1. Queues help different parts of a system work independently. This means you can change one part without affecting the others, making updates easier.
  2. They improve a system's ability to handle more users at once. You can add more servers to take in requests without needing to instantly boost how fast they are processed.
  3. Queues also keep things running smoothly during busy times. They act like a waiting area, holding tasks so no work gets lost even if things get too hectic.
The ZenMode 42 implied HN points 24 Jan 25
  1. Feature flags allow you to turn app features on or off without changing the code. This is like having a light switch for each feature, making it easy to manage them.
  2. Different types of feature flags help with various tasks, like rolling out incomplete features or testing new ideas with users. This way, you can learn what works best before a full launch.
  3. Building a feature flag system requires a control service, a way to store the flags, and an interface to access them in your app. This helps keep everything organized and responsive.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
System Design Classroom 419 implied HN points 04 May 24
  1. The Observer Pattern creates a one-to-many relationship. This means when one object's state changes, all of the connected objects are notified.
  2. Components can be loosely coupled, allowing them to work together without needing to know much about each other. This makes it easy to add or change observers.
  3. Because observers can be added or removed without modifying the main subject, the system stays flexible. This helps avoid complications in your design.
System Design Classroom 299 implied HN points 16 May 24
  1. Getting timeouts right is important. If you wait too long, your system slows down, but if you timeout too fast, you might miss a successful call.
  2. Circuit breakers help manage failures. They quickly stop requests to a failing service, allowing your system to recover faster.
  3. Bulkheads keep parts of your system separate. If one part fails, the others keep working, preventing a complete shutdown of the system.
burkhardstubert 39 implied HN points 19 Aug 24
  1. CrowdStrike made a big mistake by rolling out an untested update to all users at once, causing millions of computers to crash. They need to treat configuration updates like real code and test them properly.
  2. Delta Airlines faced huge losses because it didn’t have backup systems in place when the CrowdStrike update went wrong. Having spare systems or a better contingency plan could have minimized disruptions.
  3. Microsoft should improve its recovery methods after crashes, possibly by adopting an automatic system recovery strategy. Learning from other platforms could help avoid these issues in the future.
Engineering At Scale 45 implied HN points 22 Dec 24
  1. Video streaming is a big part of internet traffic today, making up over 82% of it. Understanding how video streaming works is important, especially for tech job interviews.
  2. Key concepts in video streaming include frames, pixels, bitrate, and resolution. These terms help define video quality and how videos are stored and transmitted.
  3. Video encoding and transcoding are crucial for making video files smaller and compatible with different devices. This process ensures smooth playback without losing too much quality.
Tech Ramblings 39 implied HN points 11 Aug 24
  1. Designing software is like laying the foundation of a house. A good structure makes it easier to build and change things later.
  2. Planning your work is crucial. Just like you wouldn't install plumbing before your walls are up, you shouldn't write code before having a solid plan.
  3. Create a clear process to develop your software. Start with architecture, build the basics, and then refine. This helps you deliver updates quickly and efficiently.
Hung's Notes 59 implied HN points 18 Jul 24
  1. Fine-Grained Authorization (FGA) is a better way to manage user permissions in a system. It allows specific users to have certain actions on specific resources, making access control simpler and more organized.
  2. Relationship-Based Access Control (ReBAC) focuses on the connections between users and resources instead of just roles. It builds a graph to show these relationships, but it can be complicated and difficult to maintain.
  3. Attribute-Based Access Control (ABAC) uses attributes of users and resources to determine access, making it flexible and easier to implement. It allows for clear policy definitions without needing to change how users interact with the system.
Technology Made Simple 379 implied HN points 12 Feb 24
  1. Space-Based Architecture (SBA) distributes processing and storage across multiple servers, enhancing scalability and performance by leveraging in-memory data grids.
  2. The components of SBA include Processing Units (PU) for executing business logic, Virtualized Middleware for managing shared infrastructure, and data pumps for data marshaling.
  3. SBA offers benefits such as scalability, fault tolerance, and low-latency data access, but comes with challenges like complexity in design, debugging, and data security.
Technology Made Simple 179 implied HN points 11 Mar 24
  1. Goodhart's Law warns that when a measure becomes a target, it can lose its effectiveness as a measure.
  2. The law often unfolds due to complications in real-world systems, human adaptability, and evolutionary pressures.
  3. To address Goodhart's Law, consider using multiple metrics, tying metrics to ultimate goals, and being prepared to adapt metrics as needed.
SwirlAI Newsletter 432 implied HN points 28 Jun 23
  1. The newsletter provides a Table of Contents with more than 90 topics, making it easier to find the content of interest.
  2. Topics covered include Data Engineering fundamentals, Spark architecture, Kafka use cases, MLOps deployment processes, System Design examples, and more.
  3. If interested, it's recommended to support the author's work by subscribing and sharing the content.
CodeFaster 36 implied HN points 19 Nov 24
  1. When coding for the future, it's important not to create more work for yourself later. Focus on avoiding technical debt instead of trying to predict every future need.
  2. Don't go overboard with coding. Keep your code simple and flexible, ensuring it can adapt to changes without adding extra complexity.
  3. Instead of trying to build reusable programs from the start, solve the immediate problem first. You can refactor and create reusable parts later if needed.
Technology Made Simple 119 implied HN points 18 Mar 24
  1. When designing a live streaming platform like Twitch, key steps include ingestion, transcoding, packaging, CDN utilization, and database management.
  2. Challenges like low latency, scalability, and reliability must be addressed for the success of a live streaming platform.
  3. To enhance a streaming service further, consider advanced technologies like adaptive bitrate algorithms, advanced caching, and community features.
Technology Made Simple 119 implied HN points 10 Mar 24
  1. Writing allows you to store knowledge for future reference, spot cognitive blindspots, and engage with topics more deeply for better understanding.
  2. Challenges in self-learning writing include lack of contextual understanding, a defined learning path, and a peer network for feedback.
  3. Addressing challenges in self-learning involves finding strategies to gain clarity, identifying knowledge gaps, and seeking feedback from peers.
Software Ninja Handbook 3 HN points 12 Sep 24
  1. Monolithic applications have a single codebase, which makes them easier to manage for smaller projects, but harder to debug as they grow. Everything is tightly connected, so a problem in one part can affect the whole system.
  2. Microservices break down applications into smaller, independent services that can be developed and deployed separately. This allows teams to work faster and use different technologies for different parts of the application.
  3. Choosing between monolithic and microservices depends on factors like project size and team structure. Monoliths are good for small projects while microservices are better for larger, complex systems that need flexibility and scalability.
VuTrinh. 119 implied HN points 27 Jan 24
  1. Rust uses ownership to manage memory, meaning each value has a single owner. When that owner goes out of scope, the memory gets freed automatically.
  2. Python uses a garbage collector to handle memory which counts how many references point to an object. Once there are no references left, it cleans up the unused memory.
  3. Rust's approach gives developers more control but requires them to understand ownership rules, while Python's method is easier for beginners but can slow down performance.
Technology Made Simple 139 implied HN points 04 Dec 23
  1. Single Tenant Architecture provides each customer their own independent database and software instance, offering security and customization like living in a detached house.
  2. Multi-Tenant Architecture is akin to an apartment building where multiple tenants share common infrastructure, allowing for economies of scale but potentially limiting customization.
  3. Single Tenant architecture is known for high user engagement, control, and stability, while Multi-Tenant architecture favors compliance, security, and quick onboarding for better scalability.
Technology Made Simple 159 implied HN points 07 May 23
  1. Amazon Prime Video saw a 90% cost reduction by moving away from Microservices to a monolith architecture. This change improved scalability and reduced infrastructure costs significantly.
  2. The challenges Amazon faced with their initial microservices implementation included hitting scaling limits and high overall costs of the system. Moving to a monolith architecture helped address these issues and allowed for better scaling.
  3. While the debate between Microservices and Monoliths continues, the decision should depend on factors like team size, emphasis on scale, and complexity. Microservices offer scalability but require careful planning, while monoliths are easier to design and manage.
The Tech Buffet 139 implied HN points 10 Oct 23
  1. RAG systems can produce impressive results but require careful tuning to be reliable in real-world applications. Just copying and pasting code won't necessarily work for complex use cases.
  2. Understanding the RAG framework is important, as it involves various components like data loaders, splitters, and embedding models. Each part plays a crucial role in generating accurate answers.
  3. Using frameworks like LangChain can simplify the process of prototyping RAG systems, but they still need thoughtful configuration to function effectively in production.
Technology Made Simple 119 implied HN points 17 Apr 23
  1. Location matters: Place software close to clients for faster response times using CDNs, edge computing, or geo-replication.
  2. Cache wisely: Optimize speed by using in-memory caching, database caching, or web caching to avoid repeated actions.
  3. Async is key: Improve efficiency with asynchronous processing through message queues, event-driven architectures, or microservices.
system bashing 117 implied HN points 18 Jul 23
  1. In a tech company, engineering involves balancing cloud costs and user interface to optimize costs and enhance user experience.
  2. Reducing costs significantly is crucial for a company's profitability regardless of other measures like discounts or marketing strategies.
  3. Engineering decisions impact user experience constraints and cloud costs, requiring a balance between the two for system efficiency.
Technology Made Simple 79 implied HN points 15 May 23
  1. Shipping software quickly has many benefits, including improved efficiency and cost savings.
  2. A slow release process can lead to increased expenses as more resources are required for tasks like testing, outage management, and manual deploys.
  3. Investing in designing systems that support fast shipping is valuable as it helps build a culture of efficiency and productivity.
Technology Made Simple 79 implied HN points 03 Apr 23
  1. Discord faced performance issues with Cassandra, requiring increasing maintenance effort and leading to unpredictable latency.
  2. Hot partitions were a problem in Cassandra, causing hotspotting and impacting the database's performance during concurrent reads.
  3. Garbage collection in Cassandra posed challenges, leading Discord to switch to ScyllaDB which does not have a garbage collector.
Technology Made Simple 99 implied HN points 29 Jan 23
  1. Design complex systems by layering multiple smaller solutions for better results instead of focusing on individually engineered tasks.
  2. Building a search engine like Google involves accommodating various types of search results like images, text, gifs, and videos while ensuring search quality.
  3. Handling the massive scale of data in Google's search engine system involves using semi-supervised labeling techniques to manage unlabeled data efficiently.
The ML Engineer Insights 7 HN points 03 Jul 24
  1. Machine learning interviews often cover four main rounds: breadth, depth, system design, and coding.
  2. Preparing for machine learning interviews requires a balance of understanding fundamental topics and practicing with sample questions.
  3. Machine learning system design interviews focus on problem definition, evaluation metrics, feature and data handling, model development, and deployment strategies.
Technology Made Simple 79 implied HN points 06 Mar 23
  1. Complex architectures can significantly impact developer productivity, software quality, and turnover, with potential for 50% drops in productivity and significant increases in defect density and staff turnover.
  2. Architectural complexity can lead to increased defect density in codebases, higher time consumption, and a higher probability of developers leaving the firm.
  3. Complexity can breed more complexity, creating a cycle that hampers future system developments.
Technology Made Simple 59 implied HN points 04 Sep 23
  1. A robust system design should be secure, reliable, scalable, and independent, allowing for iterative changes without disruption.
  2. Document everything to help visualize deployments, collaborate effectively, and guide future design decisions.
  3. Simplify system design, use fully managed services, decouple architecture, and strive for a stateless architecture to improve reliability and scalability.
The ZenMode 42 implied HN points 16 Mar 24
  1. Sharding is a technique to horizontally partition a data store into smaller fragments across multiple servers, aiding in scalability and reliability.
  2. Before sharding a database, consider options like vertical partitioning, database optimization, replication, and caching to improve performance without the added complexity of sharding.
  3. Different sharding strategies like Hash Sharding, Range Sharding, and Directory-Based Sharding have unique considerations and advantages based on factors like data distribution, queries, and maintenance.
Technology Made Simple 79 implied HN points 14 Nov 22
  1. Combining common ideas can lead to great results. The fundamentals of utilizing serverless architecture and CDNs like Google Cloud and Fastly CDN were key to Khan Academy handling increased traffic.
  2. CDNs are important for scalability. They consist of servers distributed worldwide, enabling faster user interactions by caching content and optimizing server resources.
  3. Serverless architecture provides scalability and performance. By hosting applications on external servers, like those at Khan Academy, the system handled increased traffic efficiently without manual intervention.
Technology Made Simple 39 implied HN points 06 Feb 23
  1. A Database Management System (DBMS) is a tool to manage data, providing an abstraction to store and retrieve data without directly interacting with databases.
  2. DBMS operates using a Query Language, offering guarantees for performance, but the specifics can vary between different systems.
  3. Guarantees provided by a DBMS include serving different data storage purposes, using a client/server model, and incorporating components like transaction managers, lock managers, and storage engines.
Sunday Letters 119 implied HN points 01 May 22
  1. New programming languages or techniques won't solve old problems. Teams need to focus on fixing their systems before expecting a new language to make things better.
  2. People often believe learning a new language will improve their skills, but it doesn't address deeper issues like organization or trust. Just like learning a different language won’t make someone a great writer.
  3. Fixing systemic team issues takes hard work and discipline. Sometimes, sticking with familiar tools can be more effective than constantly chasing new options.
System Design Classroom 2 HN points 10 Jul 24
  1. To handle system failures, you can use different strategies like 'Fail Fast' which stops operations quickly to save resources. But this can affect user experience because they won't get a chance to recover from the error.
  2. Another approach is 'Fail Silent', where instead of showing an error, the system quietly returns a default value. It helps keep things running smoothly, but users might miss important information if data is missing.
  3. Lastly, there's 'Custom Fallback', which uses saved local data when a service fails. This keeps the service active, but the information might be outdated, which can confuse users.
ppdispatch 5 implied HN points 31 Dec 24
  1. Over-abstraction in code can make things complicated and hard to manage, so it's important to keep it simple. If you complicate your system, it might end up slowing down and confusing your team.
  2. Fish-shell switched from C++ to Rust to improve safety and performance, showing how changing your tools can lead to better results. Their move has also engaged the community and made contributions easier.
  3. Understanding the differences between PHP's getenv() and $_ENV can prevent unexpected issues when accessing environment variables. It's essential to know how your PHP configuration handles these variables to avoid problems.