Engineering At Scale

Engineering At Scale is a weekly column focused on elucidating complex engineering concepts like databases, system design, architecture, and guiding through engineering careers in accessible content. It covers topics from API gateways to database sharding, scalability, cloud applications development, vector databases, and practical advice for mastering system design interviews.

Databases System Design Software Architecture Engineering Careers Microservices Cloud Computing Distributed Computing API Design Scalability Performance Optimization

The hottest Substack posts of Engineering At Scale

And their main takeaways

Designing Instagram's Video Uploads: Optimizing for Low Latency and scalability

255 implied HN points • 20 Jan 25

Instagram's video upload system needs to handle millions of uploads daily while keeping the process fast and efficient. It converts videos into different formats for users with varying internet speeds.
The system can be designed in approaches, starting from simple methods to more complex asynchronous solutions. Improving reliability and speed is key to making the service work better.
Using segmented video uploads allows faster processing. By uploading smaller parts of the video, the service can work on them at the same time, reducing wait times for users.

Scaling Distributed Systems with the Scatter-Gather Pattern

60 implied HN points • 15 Feb 25

🕹 Technology Distributed Systems Microservices Cloud Computing Software Design System Architecture

The Scatter-Gather pattern helps speed up data retrieval by splitting requests to multiple servers at once, rather than one after the other. This makes systems respond faster, especially when lots of data is needed.
Using this pattern can improve system efficiency by preventing wasted time waiting for responses from each service. This means the system can handle more requests at once.
However, implementing Scatter-Gather can be tricky. It requires careful handling of errors and managing different data sources to ensure the information is accurate and reliable.

TAO - Meta's Scalable architecture powering world's largest social graph

120 implied HN points • 09 Nov 24

🕹 Technology Software Architecture Data Systems Engineering

Meta created TAO to handle the huge amount of data and user interactions on its platform. This system helps generate personalized content for over 2 billion users very quickly.
TAO uses a layered architecture that includes caching and data storage to improve performance. This design helps distribute the load and maintain fast responses even when many users are active.
TAO prioritizes high availability over strict data consistency. This means it can sometimes show slightly out-of-date information, but it still works well for users, especially during busy times.

Unraveling the Internals of Video Streaming services

45 implied HN points • 22 Dec 24

🕹 Technology Video Streaming System Design Performance

Video streaming is a big part of internet traffic today, making up over 82% of it. Understanding how video streaming works is important, especially for tech job interviews.
Key concepts in video streaming include frames, pixels, bitrate, and resolution. These terms help define video quality and how videos are stored and transmitted.
Video encoding and transcoding are crucial for making video files smaller and compatible with different devices. This process ensures smooth playback without losing too much quality.

Scaling Zerodha's Reporting System through 7 million PostgreSQL tables

15 implied HN points • 09 Jan 25

🕹 Technology Software Databases Architecture Innovation User Experience

Zerodha created an innovative system with 7 million PostgreSQL tables to handle user reporting requests efficiently. This solution tackled issues with slow queries and poor user experiences during busy periods.
They switched from a synchronous to an asynchronous model, allowing users to submit requests and check back later for results. This change improved the overall user experience significantly.
The new architecture involved using a temporary database to handle queries and storing results in many tables. While it works well for now, they might need to consider other solutions if user growth continues rapidly.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Demystifying API Gateways: What They Are & Why They Matter

75 implied HN points • 11 Feb 24

🕹 Technology APIs Microservices Architecture Security Performance

API Gateway acts as an intermediary in microservices, handling client requests, and routing them to the appropriate microservices, simplifying communication for clients.
API Gateway enhances security by authenticating and authorizing requests, provides rate-limiting to prevent attacks, and improves performance through caching and protocol conversion.
Downsides of API Gateways include increased latency due to an extra hop, potential single point of failure, and added complexity to the system architecture.

System Design Concepts: An in-depth guide to Database sharding

30 implied HN points • 29 Jul 23

🕹 Technology Systems Databases Scaling Performance Architecture

Database sharding splits a large dataset into chunks stored on different machines, increasing storage capacity and distributing queries for better performance.
Sharding allows for high availability by avoiding a single point of failure and higher read/write throughput by distributing query load.
Cost and maintenance overhead are drawbacks of sharding, and it differs from partitioning where data is stored on a single machine.

Will PostgreSQL switch to a Thread-based model ?

15 implied HN points • 24 Jun 23

🕹 Technology Database Architecture Modeling Software Development Performance

PostgreSQL currently uses a process-based model for handling client connections and managing data.
The process-based model offers advantages like fault isolation, security guarantees, and efficient resource management.
Although there are advantages to the process-based model, the community is considering a switch to a thread-based model for PostgreSQL in the future.

System Design Fundamentals: What are Non-Functional Requirements ?

15 implied HN points • 20 Jun 23

🕹 Technology System Design Performance Security Scalability

Non-functional requirements focus on how the system achieves its objectives.
Latency is crucial for performance and user experience, minimize it through techniques like caching.
Scalability is vital for handling growth without performance degradation, invest in better hardware or cloud services.

System Design of Uber's CacheFront - Serving more than 40 million low-latency reads per second

4 HN points • 03 Mar 24

🕹 Technology System Design Caching Databases Scalability Integrations

Uber developed CacheFront, an integrated caching solution to overcome problems like maintenance overhead, reduced developer productivity, and region failovers caused by using Redis for caching
Docstore's architecture includes a Control plane, Query Engine, and Storage Engine, with relevant responsibilities for each layer like query execution, data persistence, transaction management, and more
CacheFront's design addressed non-functional requirements like consistency guarantees, cache warming & region failovers, fault tolerance, hot partition issues, and performance & cost improvements

Towards modern development of cloud applications

3 HN points • 26 Jan 24

🕹 Technology Development Architecture Performance Challenges Methodology

Microservices offer advantages like scalability and fault-tolerance, but come with challenges like increased latency and management overhead.
A proposed solution suggests writing monolith applications, leveraging runtime for deployments, and implementing atomic rollouts to address microservices challenges.
By modularizing code into components, abstracting communication details, and managing deployment lifecycles, the solution aims to improve performance and reduce costs.

System Design Fundamentals: What is a Load Balancer ?

2 HN points • 15 Jan 24

🕹 Technology System Design Algorithms Hardware Software

Load Balancers distribute client requests to different servers, improving system reliability and scalability.
Load Balancers handle growing internet usage by evenly distributing workloads, preventing servers from being overwhelmed.
Different types of Load Balancers include Hardware, Software, and Cloud Load Balancers, each with unique benefits for system optimization.

Vector Databases: Databases for the AI era

3 HN points • 15 Jul 23

🕹 Technology Databases AI Search Applications Conclusion

Vector databases are trending in the tech industry, especially with AI applications and investments from various sources.
Data can be classified into structured, semi-structured, and unstructured categories, each requiring different database solutions.
Vector databases excel in handling unstructured data, like images and videos, providing specialized search capabilities for applications like recommendation systems and fraud detection.

System Design Concepts: Dive deep into Database Sharding Strategies

2 HN points • 05 Aug 23

🕹 Technology System Design Scalability Performance Data security

Range-Based Sharding divides data based on ranges like organizing books in bookshelves to make searches easier.
Hash-Based Sharding evenly distributes data across different shards using a hash function, but may require data rebalancing when the number of shards changes.
Consistent Hashing minimizes data movement when adding or removing shards, improving scalability while Geo-Based Sharding stores data close to users for better performance.

Mastering System Design Interviews: 5 Key Tips for Success

1 HN point • 08 Jul 23

🕹 Technology Interview Preparation Communication Skills

Understand the fundamentals thoroughly before a system design interview.
Read tech blogs and research papers to widen your knowledge and gain technical breadth.
Improve your communication skills to effectively convey your solutions during interviews.

Exploring the Eight Fallacies of Distributed Computing

1 HN point • 01 Jul 23

🕹 Technology Distributed Systems Networks Security Latency

Network reliability is not guaranteed, so build systems with resilience to handle failures.
Latency in data transmission is influenced by factors like distance and database optimization.
Consider security, system topology changes, and interoperability when designing distributed systems.

System Design Fundamentals: Understanding Scalability

0 implied HN points • 10 Jun 23

🕹 Technology System Design Scalability Cloud Computing Infrastructure Software Development

Scalability is crucial for software systems to handle increasing demand and data.
Building scalable systems can involve horizontal scaling (adding more machines) or vertical scaling (adding more resources to the same machine).
Cloud technologies, like auto-scaling and managed databases, offer solutions for building scalable systems.

The Reliability Revolution: Idempotent APIs and the Future of Distributed Computing

0 implied HN points • 04 Jul 23

🕹 Technology APIs Distributed Computing Fault Tolerance System reliability

Building reliable systems in an unreliable world is crucial for the success of products and services.
Failures in distributed systems can lead to challenges like duplicate transactions, but idempotent APIs can help ensure consistency.
Idempotent APIs are key in guaranteeing data integrity, simplifying error handling, and enhancing fault tolerance in distributed systems.

Join my chat

0 implied HN points • 10 Feb 23

An announcement about a new subscriber chat space in the Substack app
To join the chat, download the Substack app available for iOS and Android
Start by clicking the link to download the app and then open it to access the chat feature

APIs explained in layman terms

0 implied HN points • 10 Feb 23

APIs are interfaces that accept inputs and produce outputs.
APIs are the building blocks of websites and allow communication between clients and servers.
Real-world examples of API usage include Google Maps, Twitter, Stripe, and cloud APIs.