The hottest Parallel Computing Substack posts right now

And their main takeaways
Category
Top Technology Topics
Fprox’s Substack 269 implied HN points 25 Jan 26
  1. Zvabd adds vector integer absolute-value and absolute-difference instructions plus widened-accumulate variants, targeting DSP use and keeping some ops limited to 8/16-bit to reduce hardware cost.
  2. Zvzip provides vzip, vunzip (even/odd), and vpair instructions to interleave and extract paired elements more directly than emulating with vcompress, and these new ops support optional masking.
  3. Zvdot4a8i defines 4-element 8-bit dot-product vector ops (vector-vector and vector-scalar) that multiply and accumulate 4×8-bit groups into 32-bit results, paving the way for faster matrix-style computations.
Confessions of a Code Addict 577 implied HN points 18 Dec 25
  1. Traditional PRNGs are sequential and don’t parallelize well. Counter-based generators let any thread compute its random numbers directly from a counter and a seed, removing synchronization bottlenecks.
  2. Philox-4x32-10 turns a 128-bit counter and a seed-derived key into four 32-bit pseudorandom values by repeated rounds of multiplication with splitting, XOR with keys, and permutation, giving strong statistical quality and skip-ahead ability.
  3. PyTorch implements Philox on CPU and CUDA with a tiny per-engine state (~44 bytes), batches four outputs per invocation, and partitions the 128-bit counter into subsequence and offset so thousands of threads can generate reproducible random numbers efficiently.
CPU fun 121 implied HN points 22 Feb 24
  1. Floating point arithmetic can be more complex than expected, especially due to limited mantissa bits, affecting the accuracy of calculations.
  2. Complaining about OpenMP reductions giving 'the wrong answer' is misguided; the issue likely existed in the serial code and is now being exposed.
  3. Changing the type of the accumulator to 'double' can help resolve issues with floating point arithmetic and accuracy during sum reductions.
Jacob’s Tech Tavern 2 HN points 04 Mar 24
  1. Testing on a real device to identify user-facing problems is crucial for improving app performance.
  2. Profiling the app using Instruments to identify bottlenecks and implementing targeted code improvements based on the findings can significantly enhance performance.
  3. Improving processing speed, utilizing parallelism, and optimizing code to run earlier during app launch are key strategies for enhancing the performance of Swift apps.
Get a weekly roundup of the best Substack posts, by hacker news affinity: