Delayed Branch | Revenue & Trends

The hottest Substack posts of Delayed Branch

And their main takeaways

The analysis of Sapphire Rapids CPU core-to-core latency is affected by factors like instance type and lack of detailed performance data.
Intel's adoption of EMIB technology for Sapphire Rapids allows for integration of multiple chiplets in the same package, impacting latency and performance.
Understanding the latency costs and implications of EMIB for core communication in Sapphire Rapids can help evaluate its performance impact on different workloads.

Curated resources for new C/C++ programmers or those refreshing their knowledge.
Videos cover C++ basics such as pointers and C++ fundamentals like destructor functions.
Advanced topics include optimization thinking and designing for performance in C++.

Complexity in software can lead to bugs, difficult changes, and longer onboarding for new developers.
It's better to have local complexity within components than global complexity between components.
Adding complexity to simple components is preferred over adding complexity to complex components.

System calls are expensive and can harm application performance, best to avoid them in hot data paths.
Context switches can cause stalls, tune thread pools and consider thread pinning to optimize performance.
Handling TLB misses and page faults efficiently can reduce system latency, consider using hugepages and mlock().

Get a weekly roundup of the best Substack posts, by hacker news affinity:

To improve software performance, focus on doing less work, doing work faster, and doing work in parallel.
Avoid unnecessary copies in your code by using std::move, std::string_view, and std::span<T>.
Optimize performance by understanding trivially copyable types, reducing strength in operations like integer division, and being cautious with std::shared_ptr<T>.

Reduce tail latency by simplifying software operations or eliminating high variance operations
Optimize cache performance by reducing cache misses through field inlining, alignment, padding, clustering, bitpacking, and intrusive data structures
Improve performance by avoiding dynamic memory allocations and locks, using preallocation, inline storage optimizations, conditional locking, and per-thread data

Cache coherence protocols like MESI manage consistent views of memory locations across CPU cores.
In NUMA systems, core-to-core communication between different domains can drastically impact performance.
Avoid false sharing by adding padding to prevent multiple cores from contending for access to the same cache line.