Push to Prod

Gritty details about distributed systems, AI/ML infrastructure, and technical leadership from an engineer who has worked at Netflix, Twitter, and Comcast.

The hottest Substack posts of Push to Prod

And their main takeaways
59 implied HN points 13 Aug 24
  1. When a system gets slow, it’s often because of queues. Queues help manage requests but can create delays if not handled properly.
  2. Different types of queues can slow down your system, like thread pools, connection pools, and TCP queues. Keeping these optimized can improve performance.
  3. Using thread dumps can help identify problems in your system. They can show which threads are blocked and help you fix the slowdowns.
19 implied HN points 04 Sep 24
  1. It's important to set boundaries and learn to say no to extra work or distractions. This can help you stay focused on your own goals.
  2. Using clear and direct phrases when saying no can make it easier for others to understand your limits. This helps avoid long discussions about why you can't help.
  3. Saying no doesn’t make you a bad teammate. It's about prioritizing your tasks to be more effective and contributing to your own success.
59 implied HN points 30 Jul 24
  1. Metrics give us a view of our systems, but they won't show the complete picture. It's like looking at a map; it can guide us but doesn't capture all the details.
  2. When we check the data, we might miss important moments because of how we sample information. This can lead to misunderstandings about our system's performance.
  3. Understanding that metrics are imperfect helps us make better decisions. We should use them to create theories, not think they tell us everything.
39 implied HN points 05 Aug 24
  1. When you feel overwhelmed, writing down your questions can help clarify your thoughts. It's a simple way to break down a complex problem.
  2. Answering even one question can give you more confidence and direction, leading to better decision-making. It’s a helpful way to gather information and make progress.
  3. This technique isn't just for work; it can be useful in everyday situations too, like before meetings with accountants or lawyers. Taking the time to write questions helps you feel more prepared.
19 implied HN points 23 Jul 24
  1. Understanding concurrency is a long-term process that requires ongoing learning. It's normal to feel confused, but every experience adds to your knowledge.
  2. It's important to be open about your knowledge gaps. Accepting that you don't know everything helps you grow and learn from others.
  3. Mistakes and misunderstandings are part of the journey. Embracing these moments can lead to valuable insights and a deeper comprehension.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
5 HN points 27 Aug 24
  1. At Netflix, there was a serious concurrency bug causing CPU problems, and they needed a quick solution. They couldn't fix it right away and had to come up with a way to keep their systems running through the weekend.
  2. Instead of manually fixing everything, they created a self-healing system. They randomly killed a few server instances every 15 minutes, replacing them with fresh ones, which allowed the team to relax during the crisis.
  3. This situation taught them that sometimes unconventional solutions are necessary. Prioritizing the team's well-being can be just as important as fixing technical issues.