The hottest Parallelism Substack posts right now

Associativity is the key property that lets you split work, combine partial results, and safely parallelize or stream computations without changing the answer.
Softmax has a hidden associative state — tracking a local max and a scaled sum lets you correct and merge chunked results, which is the math behind FlashAttention’s memory- and time-saving trick.
When optimizing a global computation, look for a small combinable state and an associative combine rule; if it exists you can chunk and parallelize, and if it doesn’t (for example, median) you need a different algorithmic approach.

Understanding Spark architecture is crucial for optimizing performance and identifying bottlenecks.
Differentiate between narrow and wide transformations in Spark, and be cautious of expensive shuffle operations.
Utilize strategies like partitioning, bucketing, and caching to maximize parallelism and performance in Spark applications.

Choose the right file format for your data storage in Spark like Parquet or ORC for OLAP use cases.
Understand and utilize encoding techniques like Run Length Encoding and Dictionary Encoding in Parquet for efficient data storage.
Optimize Spark Executor Memory allocation and maximize the number of executors for improved application performance.

GPUs are essential for modern compute, especially with the rise of AI workloads like large language models that heavily rely on tensor operations like matrix addition and multiplication.
When working with GPUs, programmers use CUDA to define functions called kernels that can be launched on the GPU. Parallelism is explicitly defined and optimized, unlike in CPUs where loops iterate serially over data sets.
Execution on GPUs differs from CPUs due to the minimal cost of GPU hardware threads, efficient thread scheduling at the hardware layer, and the use of warps to execute parallel instructions. GPUs optimize for high throughput with many hardware threads, while CPUs focus on low latency for individual instructions.

In MSSQL to PostgreSQL migrations, challenges like query slowdowns may arise, with some queries taking significantly longer to execute in PostgreSQL compared to MSSQL.
Join algorithm selection and parallelism are two key advantages contributing to MSSQL's impressive query execution speed.
Multi-clause selectivity estimation in MSSQL allows for more precise cardinality estimation in complex join queries, giving it an edge over PostgreSQL in certain scenarios.

Get a weekly roundup of the best Substack posts, by hacker news affinity: