The hottest Data architecture Substack posts right now

And their main takeaways
Category
Top Technology Topics
davidj.substack 59 implied HN points 13 Jan 25
  1. The gold layer in data architecture has drawbacks, including the loss of information and inflexibility for users. This means important data could be missing, and making changes is hard.
  2. Universal semantic layers offer a better solution by allowing users to request data in plain language without complicated queries. This makes data use easier and more accessible for everyone.
  3. Switching from a gold layer to a semantic layer can improve efficiency and user experience, as it avoids the rigid structure of the gold layer and adapts to user needs more effectively.
davidj.substack 179 implied HN points 25 Nov 24
  1. Medallion architecture is not just about data modeling but represents a high-level structure for organizing data processes. It helps in visualizing data flow in a project.
  2. The architecture has three main layers: Bronze deals with cleaning and preparing data, Silver creates a structured data model, and Gold is about making data easy to access and use.
  3. The terms Bronze, Silver, and Gold may sound appealing to non-technical users but could be more accurately described. Renaming these layers could better reflect their actual roles in data handling.
VuTrinh. 339 implied HN points 25 May 24
  1. Twitter processes an incredible 400 billion events daily, using a mix of technologies for handling large data flows. They built special tools to ensure they can keep up with all this information in real-time.
  2. After facing challenges with their old setup, Twitter switched to a new architecture that simplified operations. This new system allows them to handle data much faster and more efficiently.
  3. With the new system, Twitter achieved lower latency and fewer errors in data processing. This means they can get more accurate results and better manage their resources than before.
VuTrinh. 399 implied HN points 20 Apr 24
  1. Lakehouse architecture combines the strengths of data lakes and data warehouses. It aims to solve the problems that arise from keeping these two systems separate.
  2. This new approach allows for better data management, including features like ACID transactions and efficient querying of big datasets. It enables real-time analytics on raw data without needing complex data movements.
  3. With the help of technologies like Delta Lake and similar systems, the Lakehouse can handle both structured and unstructured data efficiently, making it a promising solution for modern data needs.
VuTrinh. 139 implied HN points 15 Jun 24
  1. Apache Druid is built to handle real-time analytics on large datasets, making it faster and more efficient than Hadoop for certain tasks.
  2. Druid uses a variety of node types—like real-time, historical, broker, and coordinator nodes—to manage data, process queries, and ensure everything runs smoothly.
  3. The architecture allows for quick data retrieval while maintaining high availability and performance, making it a strong choice for applications that need fast, interactive data exploration.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
VuTrinh. 59 implied HN points 14 May 24
  1. Netflix has a strong data engineering stack that supports both batch and streaming data pipelines. It focuses on building flexible and efficient data architectures.
  2. Atlassian has revamped its data platform to include a new deployment capability inspired by technologies like Kubernetes. This helps streamline their data management processes.
  3. Migrating from dbt Cloud can teach valuable lessons about data development. Companies should explore different options and learn from their migration journeys.
The Orchestra Data Leadership Newsletter 79 implied HN points 17 Feb 24
  1. The choice between microservices and monolithic architectures in data impacts the tools and solutions you choose.
  2. Microservices allow for distributed infrastructure, specialization, and easier scaling in data architecture.
  3. Assumptions about high interoperability, governance, and acceptable data egress and storage costs are key considerations when opting for a microservices approach.
Technology Made Simple 39 implied HN points 13 Feb 23
  1. Netflix utilized Open Connect Appliances to provide better streaming by localizing content on devices of certain ISPs.
  2. The use of Stateless-service architecture allows any server to step in if one fails, ensuring uninterrupted service.
  3. Netflix's redundancy strategy includes storing data in multiple zones, using 'n+1' redundancy, and employing graceful degradation techniques to maintain limited functionality in case of failure.
The Orchestra Data Leadership Newsletter 1 HN point 29 May 24
  1. Understanding the total cost of ownership is crucial when choosing between open-source and managed data architectures.
  2. Leveraging open-source software can offer cost benefits, but it also comes with risks like lack of support and high maintenance requirements.
  3. Using managed data architecture tools like Rivery and Orchestra can minimize total cost of ownership, provide scalability, and offer simplicity in maintaining data operations.
Data Products 2 HN points 23 Jun 23
  1. The difference between OLTP and OLAP systems can cause miscommunication among data producers and consumers.
  2. OLTP systems focus on serving end users quickly with specific product features, while OLAP systems handle complex analytics by scanning large amounts of data.
  3. Empathy and communication between OLTP and OLAP teams are crucial to building scalable data products.
The Orchestra Data Leadership Newsletter 0 implied HN points 15 Oct 23
  1. Knowing when to shift left on security is crucial to preventing data breaches and maintaining a secure network infrastructure.
  2. Re-evaluating the usefulness and uptake of self-service analytics tools can help in optimizing resources and avoiding unnecessary costs.
  3. Carefully analyzing cloud warehouse costs and data movement can lead to cost savings and efficient data management.
SUP! Hubert’s Substack 0 implied HN points 06 Mar 24
  1. Data mesh concept involves reassigning data ownership to the domain that captured the data, simplifying data sharing among domains.
  2. In a centralized data mesh, infrastructure and self-services are centralized, making it suitable for teams early in their data mesh journey.
  3. Peer-To-Peer Data Mesh provides complete autonomy to domains, but finding data products without a centralized location can be challenging.