Delayed Branch

Adventures In Low Level Programming And Software Development

The hottest Substack posts of Delayed Branch

And their main takeaways
67 HN points 07 Aug 23
  1. The analysis of Sapphire Rapids CPU core-to-core latency is affected by factors like instance type and lack of detailed performance data.
  2. Intel's adoption of EMIB technology for Sapphire Rapids allows for integration of multiple chiplets in the same package, impacting latency and performance.
  3. Understanding the latency costs and implications of EMIB for core communication in Sapphire Rapids can help evaluate its performance impact on different workloads.
78 implied HN points 13 Jun 22
  1. Benchmarks lie, report average time, not variance
  2. Variance equals Tail Latency
  3. Consider Average Case vs. Worst Case tradeoffs
47 implied HN points 23 Jun 22
  1. Curated resources for new C/C++ programmers or those refreshing their knowledge.
  2. Videos cover C++ basics such as pointers and C++ fundamentals like destructor functions.
  3. Advanced topics include optimization thinking and designing for performance in C++.
31 implied HN points 02 Sep 22
  1. Complexity in software can lead to bugs, difficult changes, and longer onboarding for new developers.
  2. It's better to have local complexity within components than global complexity between components.
  3. Adding complexity to simple components is preferred over adding complexity to complex components.
31 implied HN points 11 Jul 22
  1. System calls are expensive and can harm application performance, best to avoid them in hot data paths.
  2. Context switches can cause stalls, tune thread pools and consider thread pinning to optimize performance.
  3. Handling TLB misses and page faults efficiently can reduce system latency, consider using hugepages and mlock().
Get a weekly roundup of the best Substack posts, by hacker news affinity:
31 implied HN points 03 Jul 22
  1. To improve software performance, focus on doing less work, doing work faster, and doing work in parallel.
  2. Avoid unnecessary copies in your code by using std::move, std::string_view, and std::span<T>.
  3. Optimize performance by understanding trivially copyable types, reducing strength in operations like integer division, and being cautious with std::shared_ptr<T>.
15 implied HN points 26 Jun 22
  1. Reduce tail latency by simplifying software operations or eliminating high variance operations
  2. Optimize cache performance by reducing cache misses through field inlining, alignment, padding, clustering, bitpacking, and intrusive data structures
  3. Improve performance by avoiding dynamic memory allocations and locks, using preallocation, inline storage optimizations, conditional locking, and per-thread data
15 implied HN points 14 Jun 22
  1. Cache coherence protocols like MESI manage consistent views of memory locations across CPU cores.
  2. In NUMA systems, core-to-core communication between different domains can drastically impact performance.
  3. Avoid false sharing by adding padding to prevent multiple cores from contending for access to the same cache line.