The hottest Data Structures Substack posts right now

And their main takeaways

How Unix Spell Ran in 64kB RAM

Confessions of a Code Addict • 1683 implied HN points • 12 Jan 25

Unix engineers faced a big challenge in fitting a large dictionary into just 64kB of RAM. They came up with clever ways to compress the data and use efficient structures to make everything fit.
A key part of their solution was the Bloom filter, which helped quickly check if words were in the dictionary without needing to look up every single word, saving time.
They also used innovative coding methods to further reduce the size of the data needed for the dictionary, allowing for fast lookups while staying within the strict memory limits of their hardware.

What's In A List—Yes, But What's Really In A List

The Python Coding Stack • by Stephen Gruppetta • 259 implied HN points • 13 Oct 24

🕹 Technology Programming Data Structures Python Education Computing

In Python, lists don't actually hold the items themselves but instead hold references to those items. This means you can change what is in a list without changing the list itself.
If you create a list by multiplying an existing list, all the elements will reference the same object instead of creating separate objects. This can lead to unexpected results, like altering one element affecting all the others.
When dealing with immutable items, such as strings, it doesn't matter if references point to the same object. Since immutable objects cannot be changed, there are no issues with such references.

Historically, 4NF explanations are needlessly confusing

Minimal Modeling • 608 implied HN points • 05 Dec 24

🕹 Technology Database Design Data Structures Software Development Information Systems

Fourth Normal Form (4NF) is mainly about creating simple two-column tables to link related data, like teachers and their skills. This straightforward design is often overlooked in favor of complex definitions.
Many explanations of 4NF start with confusing three-column tables and then break them down into simpler forms. This approach makes it harder for learners to grasp the concept quickly and effectively.
The term 'multivalued dependency' can be simplified to just mean a list of unique IDs. You don’t really need to focus on this term to design good database tables; it's more of a historical detail.

The Pythonic Emptiness

Confessions of a Code Addict • 529 implied HN points • 09 Nov 24

🕹 Technology Programming Software Development Computer Science Data Structures Code Quality

In Python, you can check if a list is empty by using 'if not mylist' instead of 'if len(mylist) == 0'. This way is faster and is more widely accepted as the Pythonic approach.
Some people find the truthiness method confusing, but it often boils down to bad coding practices, like unclear variable names. Keeping your code clean and well-named can make this style clearer and more readable.
Using 'len()' to check for emptiness isn't wrong, but you should choose based on your situation. The main point is that the Pythonic method isn't ambiguous; it just needs proper context and quality coding.

When Things Are Slow, Look for Queues

Push to Prod • 59 implied HN points • 13 Aug 24

🕹 Technology Software Development System Architecture Performance optimization Data Structures

When a system gets slow, it’s often because of queues. Queues help manage requests but can create delays if not handled properly.
Different types of queues can slow down your system, like thread pools, connection pools, and TCP queues. Keeping these optimized can improve performance.
Using thread dumps can help identify problems in your system. They can show which threads are blocked and help you fix the slowdowns.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

HashMap is a Data Structure every SWE should know.

System Design Classroom • 239 implied HN points • 24 May 24

🕹 Technology Software Data Structures Algorithms Computer Science Development

Hashmaps are useful for storing data by connecting unique keys to their values, making it easy to find and retrieve information quickly.
When two different keys accidentally produce the same hash code, it's called a collision. There are ways to handle this, like chaining and open addressing.
Hashmaps can do lookups, insertions, and deletions really fast, usually in constant time, but they can slow down if too many items cause collisions.

Why Graphs are great for Fraud Detection [Math Mondays]

Technology Made Simple • 639 implied HN points • 01 Jan 24

🕹 Technology Graphs Neural Networks Machine Learning Data Structures

Graphs are efficient at encoding and representing relationships between entities, making them useful for fraud detection tasks.
Graph Neural Networks excel at fraud detection due to their ability to visualize strong correlations among fraudulent activities that share common properties, adapt to new fraud patterns, and offer transparency in AI systems.
Graph Neural Networks require less labeled data and feature engineering compared to other techniques, have better explainability, and work well with semi-supervised learning, making them a powerful tool for fraud detection.

The Sliding Window Technique[Technique Tuesdays]

Technology Made Simple • 279 implied HN points • 28 Feb 24

🕹 Technology Algorithms Problem Solving Software Engineering Artificial Intelligence Data Structures

The sliding window technique is a powerful algorithmic model used for problem-solving in coding interviews and software engineering, offering efficiency and practicality.
Benefits of using the sliding window technique include reducing duplicate work, maintaining consistent linear time complexity, and its utility in AI feature extraction processes.
Spotting the sliding window technique involves identifying keywords like maximum, minimum, longest, or shortest, dealing with continuous elements, and converting brute-force approaches into efficient solutions.

JWTs - Use Responsibly

Permit.io’s Substack • 59 implied HN points • 23 May 24

🕹 Technology Web Development Cybersecurity Authentication APIs Data Structures

JWTs are great for authentication but should be used carefully. They are not meant for detailed permission checks and can create security issues if misused.
They are static once issued, meaning any changes to a user's role won't be reflected until the token expires. This can lead to potential security risks.
JWTs are suitable for stateless, distributed systems and coarse-grained authorization, but for fine-grained control, other tools should be used.

LangGraph From LangChain Explained In Simple Terms

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 17 Jun 24

🕹 Technology AI Machine Learning Software Development Data Structures

LangGraph helps create clearer conversations by using graphs to map out how dialog flows between different points, making it easier to manage conversations in AI systems.
Prompt chaining connects smaller tasks in a sequence, allowing AI models to handle complex jobs step by step, but can feel rigid like traditional chatbots.
Autonomous Agents bring a higher level of flexibility in how actions are taken, but they can also lead to concerns about having enough control over their decision-making process.

Controlling Rate-Limited Events - Implementation

Low Latency Trading Insights • 117 implied HN points • 11 Feb 24

🕹 Technology Algorithms Data Structures Coding Performance

The requirements for a rate-limiting algorithm include precise event counting, fast performance especially during market turbulence, and minimal impact on cache memory.
Creating a rate-limiting algorithm using a multimap for counting events has inefficiencies; a better solution involves enhancements for optimal performance.
A bounded approximation approach for rate limiting achieves memory efficiency by assuming a minimum time precision and implementing a clever advance-and-clear mechanism.

How to Learn Data Structures and Algorithms [Storytime Saturdays]

Technology Made Simple • 299 implied HN points • 22 Jan 23

🚌 Education Data Structures Algorithms Learning

Understanding Data Structures and Algorithms is crucial for success in technical fields like software development.
Many resources focus on DSA for coding interviews, but it's important to go beyond that to deepen your knowledge.
Learning DSA effectively doesn't have to involve answering countless questions or watching numerous tutorials; there are better approaches available.

Looking at the Datastructures- Trees [Math Mondays]

Technology Made Simple • 179 implied HN points • 18 Jul 23

🕹 Technology Data Structures Software Engineering Machine Learning AI Recursion

Trees are powerful data structures that are great for efficient organization and retrieval of data in software engineering.
Recursion works well with trees due to their recursive substructure, making implementation of recursive functions easier.
Decision trees in AI excel at discerning complex patterns, providing interpretable results, and are versatile in various domains such as finance, healthcare, and marketing.

DSA For The Rest Of Us - Part 1

Data Engineering Central • 157 implied HN points • 13 Mar 23

🕹 Technology Data Engineering Data Structures Software Development Rust Programming

Understanding Data Structures and Algorithms is important for becoming a better engineer, even if you may not use them daily.
Linked Lists are a linear data structure where elements are not stored contiguously in memory but are linked using pointers.
Creating a simple Linked List in Rust involves defining nodes with values and pointers to other nodes, creating a LinkedList to hold these nodes, and then linking them to form a chain.

A deeper than average look into Stacks [Math Mondays]

Technology Made Simple • 99 implied HN points • 21 Nov 23

🕹 Technology Programming Data Structures Algorithms Computer Science Software Engineering

Stacks are powerful data structures in software engineering and can be modified extensively to suit different use cases.
Implementing Stacks using a Singly Linked List can be beneficial for dynamic resizing, though Arrays are often preferred due to memory considerations.
Exploring variations like Persistent Stacks, Limiting Stack Size, Ensuring Type Safety, Thread Safety, Tracking Min/Max, and Undo Operations can enhance the functionality and efficiency of Stacks in various scenarios.

Global Incremental ID

Hung's Notes • 79 implied HN points • 13 Dec 23

🕹 Technology Data Structures Distributed Systems Software Engineering Database Management

Global Incremental IDs are important for preventing ID collisions in distributed systems, especially during tasks like data backup and event ordering.
UUID and Snowflake ID are two common types of global IDs, each with unique advantages and disadvantages. For instance, UUIDs are larger but widely used, while Snowflake IDs are smaller but more complex to generate.
Different systems, like Sonyflake and Tinyid, offer specialized methods for generating IDs, helping to ensure performance and avoiding database bottlenecks.

Problem 85: Count Complete Tree Nodes [Amazon]

Technology Made Simple • 99 implied HN points • 04 May 23

🕹 Technology Coding Data Structures Algorithms Tech industry

The post discusses Problem 85: Count Complete Tree Nodes [Amazon], focusing on recursion, trees, and data structures.
It is about solving a problem related to counting the number of nodes in a complete binary tree efficiently.
The post mentions the importance of community engagement in choosing problems to discuss and the growth of the author's newsletter.

[Solution]Problem 85: Count Complete Tree Nodes [Amazon]

Technology Made Simple • 59 implied HN points • 05 May 23

🕹 Technology Coding Data Structures Algorithm Software Engineering

The post discusses a problem related to counting the number of nodes in a complete binary tree, emphasizing the importance of understanding recursion, trees, and data structures.
It mentions starting with a brute force solution to count nodes but highlights the need for optimization to achieve time complexity better than O(n).
The approach for solving the problem involves using a recursive template to count nodes efficiently by considering the root and the number of nodes in the left and right subtrees.

Constructing priority heap efficiently.

Software Bits Newsletter • 154 implied HN points • 17 Jun 23

🕹 Technology Coding Algorithms Data Structures

Priority heaps are commonly used in coding interview questions.
Creating a priority heap efficiently can impact time complexity.
Consider bulk operations API to optimize memory usage in algorithms and data structures.

Comparing vectors for equality.

Software Bits Newsletter • 154 implied HN points • 19 Feb 23

🕹 Technology Programming Data Structures Performance C++ Software Development

Consider using std::array instead of std::vector when you know the size at compile-time.
Using std::array can provide significant speed improvements over std::vector in certain scenarios.
While std::array is efficient for compile-time initialization and faster than std::vector, it may not be suitable for all cases.

[Solution]Problem 72: Permutation in String [Amazon]

Technology Made Simple • 39 implied HN points • 27 Jan 23

🕹 Technology Algorithms Binary Trees Graphs Coding Data Structures

The problem discussed is about validating a binary search tree, ensuring the left subtree contains smaller values, the right subtree contains greater values, and both are valid binary search trees.
Examples are provided to illustrate the concept, showing a valid and an invalid binary search tree.
Constraints include the number of nodes and the value ranges in the tree.

A Novel Data Structure For Doing Merging Well

Bram’s Thoughts • 19 implied HN points • 23 Dec 23

🕹 Technology Data Structures Version Control

The approach leads to a reliable implementation of cherry picking in merging
A tree-like data structure organizes lines in files to avoid lexical tiebreaks
Explicit cherry-picking is supported well, but implicit cherry-picking is not recommended

Using Sentinel Nodes to improve your Doubly Linked Lists[Technique Tuesdays]

Technology Made Simple • 59 implied HN points • 28 Sep 22

🕹 Technology Programming Data Structures Performance Memory management Coding

Using sentinel nodes in Doubly Linked Lists can improve performance and make code easier to read and implement
Implementing sentinel nodes removes special cases in DLL implementations, simplifies code, and makes it more provably correct
Although using sentinel nodes may require some extra memory, the simplification it brings to the code is often worth the tradeoff

Why and How Does Python Use Bloom Filters in String Processing?

Confessions of a Code Addict • 46 HN points • 14 Sep 23

🕹 Technology Programming Data Structures Implementation Python

Python uses Bloom filters in its string data structure to speed up certain string processing functions like strip and splitlines.
The unique Bloom filter implementation in CPython uses an unsigned long type to represent the bit vector, making storing and querying items more efficient.
CPython determines the position in the bit vector for adding and querying characters by using the lower n-bits of the character, avoiding costly hash computations.

Grokking Hash Array Mapped Tries (HAMTs)

Photon-Lines Substack • 57 HN points • 13 Jul 23

🕹 Technology Data Structures Computer Science Programming

HAMTs combine hash tables and tries for efficient storage of key-value pairs
HAMTs use hashing and trie-like structures to handle collisions without needing resizing
HAMTs are commonly used in functional programming for efficient and persistent data structures

How to Study Data Structures and Algorithms [Technique Tuesday]

Technology Made Simple • 79 implied HN points • 06 Apr 22

🕹 Technology Coding Data Structures Algorithms

Experts often give bad advice for studying Data Structures and Algorithms, like relying solely on Leetcode.
To effectively learn DSA, take time to understand the history and purpose of each data structure beyond just learning the mechanics.
Don't rush through learning Data Structures and Algorithms; taking it slow and grasping the fundamentals thoroughly will lead to better mastery and understanding.

How to Figure out the correct distance finding algorithm [Technique Tuesday]

Technology Made Simple • 79 implied HN points • 30 Mar 22

🕹 Technology Algorithm Graphs Coding Interviews Data Structures

BFS and DFS algorithms are foundational and crucial for various graph traversal problems, forming the basis for more complicated algorithms.
Topological Sort, Djikstra's Algorithm, and A* are important graph traversal algorithms to master, especially for weighted graphs and AI applications like self-driving cars.
For determining the correct graph traversal algorithm, identify if you need to find the shortest path (use BFS or A* for unweighted/weighted graphs), or if you need to visit the complete graph (use DFS for problems involving the entire graph).

Bloom Filter [Systems Design Sundays]

Technology Made Simple • 59 implied HN points • 10 Jul 22

🕹 Technology Data Structures Algorithms Systems Design Interview Preparation

Bloom Filters are probabilistic data structures used to efficiently test for membership.
Bloom Filters work by having a bit array of size m with k hash functions mapping values to indices, setting the indices to 1 for a given input.
Bloom Filters are great for reducing unnecessary disk access, but they can result in false positives and need regeneration as more values are added.

Grokking AVL and RAVL Trees

Photon-Lines Substack • 53 HN points • 16 Jun 23

🕹 Technology Data Structures Algorithms Programming

AVL trees ensure balance by maintaining equal heights in all sub-trees
Balanced trees provide consistent access times and ensure worst case run times are log n
RAVL trees are a variation of AVL trees that do not require re-balancing after node deletion

[Solution]Problem 58:Kth Largest Element in a Stream [Amazon]

Technology Made Simple • 39 implied HN points • 30 Sep 22

🕹 Technology Coding Data Structures Object Oriented Programming

The problem focuses on designing a class to find the kth largest element in a stream, emphasizing it's the kth largest in sorted order, not distinct element.
The implementation includes initializing the class with k and a set of numbers, then appending values to the stream to return the kth largest element.
The constraints for the problem include specific limitations on the range of values and number of calls that can be made.

All about the graphs [Math Monday]

Technology Made Simple • 59 implied HN points • 29 Mar 22

🕹 Technology Data Structures Algorithms Graphs Coding Interviews

Graphs can be seen from various perspectives: charts and plots (stats), maps with complex algorithms (graph theory), and adjacency lists for coding. Understanding these perspectives is crucial for effective use of graphs.
Identifying whether a problem could be a graph problem involves recognizing the entities (nodes), relationships (edges), and weights in the context of a system. This spotting framework helps in solving graph-related problems efficiently.
Practicing graph spotting as a skill involves starting with easy problems to identify graph components quickly. Familiarity with graphs and the ability to spot them easily is crucial for solving graph problems in interviews.

Reduce memory of graph traversal by marking spot as visited instead of using a set[Technique Tuesdays]

Technology Made Simple • 39 implied HN points • 02 Aug 22

🕹 Technology Coding Optimization Algorithms Data Structures Tech Education

In graph traversal, reducing memory usage by marking spots as visited instead of using a set can optimize your code and help you move from O(n) space complexity to O(1) complexity.
This technique is straightforward to implement, takes no extra space, and can be a significant improvement in graph traversal algorithms.
When implementing this technique, be cautious about the value used to mark visited cells and always confirm with your interviewer about input data type to avoid conflicts.

Power of Redis: A Deep Dive into Caching Systems

The ZenMode • 15 HN points • 10 Feb 24

🕹 Technology Caching Data Structures Performance Limitations Deployment

Caching like Redis stores frequently used data for faster retrieval, improving response times, reducing database load, and leading to cost-effectiveness in running high-traffic applications.
Redis is fast due to in-memory storage, optimized data structures, reduced I/O operations, single-threaded architecture, and event-driven design, but has limitations like limited capacity and issues with data persistence.
Choosing the right caching system, like Redis, requires considering factors like data size, access patterns, consistency requirements, and fault tolerance for high availability and durability.

[Solution]Problem 41: Create your own Datastructure with O(1) Time complexity (Dropbox)

Technology Made Simple • 39 implied HN points • 11 Jun 22

🕹 Technology Data Structures Logic Programming Algorithms Software Engineering

Creating a data structure with O(1) time complexity involves implementing functions like plus, minus, get_max, and get_min efficiently.
Utilizing a Doubly Linked List allows for maintaining a sorted collection of keys, enabling quick access to elements with the lowest and highest values.
Developing algorithms to handle key count increments and decrements while preserving the sorted order of the linked list is crucial for a functional solution.

Using Bits as a way to Encode Preference[Technique Tuesdays]

Technology Made Simple • 39 implied HN points • 01 Jun 22

🕹 Technology Coding Data Structures Algorithms Math

Using bit fields can significantly reduce storage requirements for tracking user preferences. A technique by Vimeo engineers allowed O(1) space compared to the traditional O(k) method, making the solution far more efficient.
Bit fields utilize bitwise operators to represent content filters, enabling quick comparisons in constant time. This approach is memory and time-efficient.
Implementing bit fields for tracking user preferences allows for efficient filtering of content by performing a bitwise AND operation between the user's and video's bit fields. This results in a quick eligibility check for the user.

Understanding Hashing [Math Mondays]

Technology Made Simple • 39 implied HN points • 17 May 22

🕹 Technology Math Data Structures Hashing Algorithms Programming

Hashing efficiently maps data to integers for quick searches and insertions.
Design choices in hashing involve handling collision, with options like separate chaining and open probing.
Rolling hash enables efficient substring searches within larger strings by computing hashes incrementally.

Memory Management Every Programmer Should Know

Web Dev Explorer • 3 HN points • 29 Apr 24

🕹 Technology Programming Memory management Data Structures Web Development

Data stored on the stack is static, fixed in size, with a fixed lifecycle, and cannot be referenced across different stack frames.
Data stored on the heap is dynamic, not fixed in size, has a flexible lifecycle, and can be referenced across different stack frames.
Various programming languages use different memory management approaches, like manual management in C, garbage collection in Java, ARC in Objective-C and Swift, and ownership mechanism in Rust.

Problem 33: Number of Islands[Spotify]

Technology Made Simple • 39 implied HN points • 14 Apr 22

🕹 Technology Graphs DFS Data Structures Coding Programming

The problem is about finding the number of islands in a 2D binary grid - islands are formed by connecting adjacent lands.
Islands are surrounded by water and connected horizontally or vertically.
The task involves determining the count of separate islands present in the grid.

Problem 61:Increasing Order Search Tree [Apple]

Technology Made Simple • 19 implied HN points • 20 Oct 22

🕹 Technology Coding Data Structures Problem Solving Software Development

Improving problem-solving skills is a goal for many people.
The problem of rearranging a binary search tree in in-order can be challenging but helps sharpen skills.
The problem involves changing the structure of the tree to have only right children and the leftmost node as the root.

Is your bucket leaky?

Freelance Footprints • 8 HN points • 20 Feb 24

🕹 Technology Algorithms Web Development Data Structures Networking Software Engineering

The leaky bucket algorithm helps manage the rate of requests a web application can handle. It uses the idea of a bucket that can fill up and overflow if too many requests come in at once.
In this algorithm, there are two key settings: the maximum number of requests allowed at a time and the rate at which requests are processed. This controls how quickly requests are dealt with and prevents overload.
The leaky bucket algorithm is widely used in tech, such as by companies like SeatGeek for their waiting room systems, to ensure smooth user experiences without exceeding server limits.