The hottest Data Management Substack posts right now

And their main takeaways
Category
Top Technology Topics
Hasen Judi 35 implied HN points 17 Jan 25
  1. The project aims to develop a conversation view that displays threaded replies in a linear format, improving user experience compared to platforms like Twitter or Reddit.
  2. A data model is proposed to track parent-child relationships between posts and replies, allowing for efficient retrieval of both ancestors and descendants of a post.
  3. The author emphasizes using the same 'Post' type across different system layers, arguing that this reduces code complexity and increases productivity compared to using separate representations for each layer.
Database Engineering by Sort 7 implied HN points 18 Dec 24
  1. Sort helps you manage database changes easily and safely, like how GitHub handles changes. You can propose changes without altering the data right away.
  2. Creating a Change Request is simple. Just suggest what you want to change and set it up for review by others in your organization.
  3. Once a Change Request is approved, it can be applied without hassle. If anything goes wrong during the process, Sort can automatically roll back the changes.
Rod’s Blog 138 implied HN points 03 Aug 23
  1. Customers can use a quick KQL query to track changes in Log Analytics workspace data retention values for Microsoft Sentinel.
  2. The provided KQL query can be utilized in various ways such as in a Workbook, a Hunting query, or as an Analytics Rule for notifications.
  3. For ongoing access to the latest version of the query and further discussion, references to the author's resources and accounts are provided.
Software Design: Tidy First? 134 HN points 04 Aug 23
  1. The goal is to achieve eventual business consistency by closely matching what's in the system with real-world events.
  2. Different data storage methods like storing dated data or double-dated data come with trade-offs in complexity and accuracy.
  3. Bi-temporal systems use two dates to track when data changes occurred in reality and when they were recorded in the system for better business operations.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Sarah's Newsletter 239 implied HN points 29 Nov 22
  1. Having an excessive number of dashboards can lead to inefficiency and confusion within an organization. It's important to prioritize strategic organization over creating new dashboards indiscriminately.
  2. Developing an automated dashboard deprecation strategy can help save time and maintain a clean BI instance. By automating the process, organizations can efficiently manage and delete unused visuals.
  3. Implementing a proactive maintenance plan, such as using a data catalog or automated tools, can help keep BI instances organized and optimal for data insights. Regular cleaning and organization are key to ensuring the effectiveness of analytics strategies.
Datent 58 implied HN points 09 Feb 24
  1. Transitioning from a BI role to a data product team requires defining a Value Gateway to ensure projects deliver tangible benefits.
  2. To manage the progress and accountability of data work, reporting on value at key points is crucial, showcasing the value realized and areas needing support.
  3. Establishing a process around failing fast and doubling down on successful projects, supported by agile project management, is essential for efficient data product management.
Engineering Enablement 14 implied HN points 05 Nov 24
  1. Platform teams handle a broader range of responsibilities compared to Developer Experience teams. This means they are involved in more of the underlying tech operations.
  2. Local development, source code management, and incident management are key tasks for both types of teams. These areas help developers write and deploy their code more smoothly.
  3. The name of the team can reflect its focus. Some teams prioritize overall developer support while others are more infrastructure-focused, suggesting that their approach can change based on company needs.
Rod’s Blog 59 implied HN points 01 Feb 24
  1. To get the most out of Microsoft Sentinel, organizations should carefully plan and prepare their deployment by assessing security needs and goals.
  2. Choosing the right subscription and pricing model is crucial for optimizing the benefits of Microsoft Sentinel, based on data requirements, user protection, and features needed.
  3. Effective management of Microsoft Sentinel involves monitoring data ingestion, leveraging AI and ML capabilities, automating workflows, and learning from security incidents and feedback.
Vasu’s Newsletter 13 implied HN points 25 Oct 24
  1. A Virtual Private Cloud (VPC) helps businesses create a separate and secure online environment to manage their resources. This means they can control who has access to what information.
  2. With a VPC, administrators can set rules to protect incoming and outgoing internet traffic. It's like having a security system for their online resources.
  3. VPCs come with useful features like VPN connections and load balancers, which help improve communication and manage traffic effectively. This can make online services run more smoothly.
burkhardstubert 39 implied HN points 19 Feb 24
  1. Over-the-Air (OTA) updates can be done in full, delta, or partial ways. Full updates ensure everything is consistent, but they are larger files and take longer to download.
  2. Delta updates save time and bandwidth by only updating the changed parts of a file. They are good for devices with slow internet connections but require a read-only setup.
  3. Staged rollouts keep updates safe by first sending them to a small group of devices. This way, if there are issues, they can be fixed before affecting everyone.
Database Engineering by Sort 7 implied HN points 20 Nov 24
  1. Sort is a platform that helps manage and change data easily without much hassle. It makes sure your database is accurate and up to date.
  2. With the new Zapier app, you can connect Sort to many other applications to automate tasks. This saves a lot of time and reduces errors since you don't have to do everything manually.
  3. Setting up automations is simple and requires no coding skills. You can start using it right away to improve your workflows.
VuTrinh. 19 implied HN points 30 Apr 24
  1. Netflix has created a platform called Data Gateway that helps their developers manage data more easily. It simplifies complex database processes so that app developers can focus on coding.
  2. The cloud storage triad talks about balancing latency, cost, and durability when storing data. Choosing the right storage solution can save money while ensuring data is always available.
  3. Managing data ingestion effectively is crucial for companies like RevenueCat. They faced challenges moving their data and found ways to optimize the process for better performance.
Rod’s Blog 39 implied HN points 07 Feb 24
  1. Use Microsoft Sentinel to detect and respond to multiple Teams deletion events in your organization.
  2. Collect Teams activity logs in Microsoft Sentinel to monitor data and detect security risks.
  3. Write custom analytics rules in Microsoft Sentinel to generate alerts for suspicious activities, such as multiple Teams deletion by a single user.
Rod’s Blog 79 implied HN points 02 Oct 23
  1. Being notified when data ingestion stops is crucial for security analysts to maintain the integrity of security tools.
  2. A KQL query can be set up as an Analytics Rule to alert if a specific table has not received new data within a set timeframe, allowing for timely action.
  3. Email alerts can be configured instead of generating unnecessary security incidents, ensuring the operations team can address potential issues efficiently.
Technology Made Simple 79 implied HN points 03 Apr 23
  1. Discord faced performance issues with Cassandra, requiring increasing maintenance effort and leading to unpredictable latency.
  2. Hot partitions were a problem in Cassandra, causing hotspotting and impacting the database's performance during concurrent reads.
  3. Garbage collection in Cassandra posed challenges, leading Discord to switch to ScyllaDB which does not have a garbage collector.
Product Composition 78 implied HN points 21 Jul 23
  1. Decipad is launching its beta version, focusing on making sense of numbers in a dynamic way
  2. AI in the industry should prioritize collapsing data to enhance clarity and facilitate action-taking
  3. The future of jobs is facing a drastic shift, with issues around productivity, social contracts, asymmetrical compensation, and poor job descriptions
Joe Reis 78 implied HN points 10 Jun 23
  1. Encourage kids and others to interact more in real life, consider alternatives to college, find careers that can't be easily automated, and learn to coexist with AI.
  2. Embrace lifelong learning and be open to change in order to adapt to evolving technologies and industries.
  3. Read up on interesting articles about tech, AI, data, and business topics for insights and inspiration.
ppdispatch 11 implied HN points 11 Feb 25
  1. Frequent interruptions, even from short messages, can hurt developers' productivity a lot. It can take over 20 minutes to refocus after just one distraction.
  2. A small update to the Linux kernel can really boost data center efficiency, potentially cutting power use by 30%. This change helps manage network traffic better without needing much setup.
  3. Many math libraries don't follow floating-point standards, leading to rounding errors. This can cause big problems in areas like gaming and machine learning where precision is key.
The Tech Buffet 39 implied HN points 03 Feb 24
  1. You can build a personal assistant to easily find and understand the latest machine learning research. This assistant will let you ask questions in simple language.
  2. The app uses a system that retrieves and generates information, utilizing a database and machine learning models. It processes data from a site called 'Papers With Code'.
  3. The guide provides step-by-step instructions on how to create, index, and deploy this assistant as a web application, including ready-to-use source code.
The Orchestra Data Leadership Newsletter 39 implied HN points 28 Jan 24
  1. Data orchestration is often confused with workflow orchestration, but it involves more than just triggering and monitoring tasks; it includes reliably and efficiently moving data into production.
  2. Reliably and efficiently releasing data into production is complex and involves elements like data movement, transformation, environment management, role-based access control, and data observability.
  3. Implementing end-to-end and holistic data orchestration offers transformative benefits such as intelligent metadata gathering, data lineage, environment management, data product enablement, and cross-functional collaboration for scalable data operations.
Why Now 5 implied HN points 09 Dec 24
  1. It's important to look for companies that create strong communities or 'religions' around their products. Companies that divide opinion often attract attention and engagement.
  2. Object storage is a powerful way to manage data, allowing for flexible and efficient storage. It uses a flat structure for data organization, making it faster to access compared to traditional file storage.
  3. The separation of storage and compute resources helps businesses scale more effectively. This means you can add storage or processing power independently, making it more efficient for varying demands.
Deploy Securely 39 implied HN points 24 Jan 24
  1. Microsoft 365 Copilot provides detailed data residency and retention controls favored by enterprises in the Microsoft 365 ecosystem.
  2. Be cautious of insider threats with Copilot as it allows access to considerable organizational data, potentially leading to inadvertent policy violations.
  3. Consider the complexities of Copilot's retention policies, especially in relation to existing settings and the use of Bing for web searches.
Rod’s Blog 59 implied HN points 07 Nov 23
  1. For Microsoft Sentinel customers, a 31-day trial period is available by enabling Microsoft Sentinel on a Log Analytics workspace.
  2. To monitor the trial period, look under the 'News & Guides' blade and access the 'Free Trial' tab to see how many days are left.
  3. In the past, the 31-day trial could be enabled unlimited times on new workspaces, but now it's limited to 20 times per Azure subscription.
davidj.substack 71 implied HN points 16 Feb 24
  1. Data teams face challenges when separated from product engineering, leading to loss of metadata and concerns about data quality. Data contracts can help address these issues by defining the nature, completeness, and format of shared data.
  2. Integrating data professionals within product teams can enhance understanding and usage of data, reducing the need for separate contracts. This approach allows for direct-to-consumer, organic data processes.
  3. Centralized data platform teams can establish common standards and infrastructure, enabling embedded data personnel in product teams to work efficiently. This collaborative model streamlines data transformation and enhances data accessibility.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 19 implied HN points 12 Apr 24
  1. An AI productivity suite helps people and businesses work more efficiently by combining tools for tasks like data analysis and automation.
  2. It allows users to automate regular tasks, freeing them to focus on more important work, and offers easy customization through no-code options.
  3. These suites also promote teamwork by improving communication and sharing among team members, leading to better project outcomes.
Data Plumbers 19 implied HN points 08 Apr 24
  1. Data democratization is vital for modern data strategies, making data more accessible and understandable within an organization for informed decision-making and better customer experiences.
  2. Databricks Unity Catalog supports data democratization by providing a centralized governance layer, simplifying access management, enabling unified data management, and fostering data discovery, collaboration, and sharing.
  3. Implementing data democratization requires robust data governance and security measures to mitigate risks of privacy violations and data leaks.
Detection at Scale 199 implied HN points 18 Jul 22
  1. Detection Engineers build systems to validate security controls and detect suspicious behaviors with code to protect organizations.
  2. Security data comes from different layers like infrastructure, hosts, networks, applications, and databases, each providing unique context for monitoring.
  3. When collecting logs for security monitoring, consider tradeoffs like the value of data for detection, latency to get data into SIEM, and cost of obtaining and retaining data.
Building a Recommendation Engine 3 HN points 04 Aug 24
  1. A recommendation engine can work without complex machine learning. Instead, it can be built using straightforward connections between content to suggest things users might like.
  2. Using an API from a platform like Are.na allows easy access to user content and helps find connections between different channels, making recommendations more relevant.
  3. It's important to filter out content that users already know or follow to give them fresh and exciting recommendations. Regular updates to the recommendations can also help keep things interesting.
The Polymerist 116 implied HN points 16 Jan 24
  1. Companies in the chemical industry can benefit from AI tools to improve efficiency and profitability.
  2. AI tools are becoming more accessible for functions like customer relationship management, inventory management, and data organization.
  3. While AI won't replace R&D functions, it can significantly enhance productivity and help companies stay competitive in specialized chemical sectors.
Rod’s Blog 59 implied HN points 05 Sep 23
  1. A Model Stealing attack against AI involves an adversary attempting to steal the machine learning model from a target AI system, potentially leading to security and privacy issues.
  2. Different types of Model Stealing attacks include Query-based attacks, Membership inference attacks, Model inversion attacks, and Trojan attacks.
  3. Model Stealing attacks can result in loss of intellectual property, security and privacy risks, reputation damage, and financial losses for organizations. Mitigation strategies include secure data management, regular system updates, model obfuscation techniques, monitoring for suspicious activity, and implementing multi-factor authentication.
Rod’s Blog 59 implied HN points 13 Jun 23
  1. Check for custom tables starting with 'EASM' to verify connection between Microsoft Defender External Attack Surface and Microsoft Sentinel.
  2. In Microsoft Sentinel, tables will show up in the Custom Logs Solutions area.
  3. Connecting EASM to Microsoft Sentinel involves three steps: setting up EASM, configuring permissions, and enabling the connection.
Practical Data Engineering Substack 59 implied HN points 01 Oct 23
  1. You can improve data accuracy by using two pipelines: one for getting recent updates quickly and another for regularly loading the entire dataset. This helps in keeping the data reliable over time.
  2. It's essential to manage pipeline scheduling based on your business's needs, like how often you need updates. You can choose faster updates or less frequent full reloads depending on how critical the data is.
  3. Using tools like Apache Airflow can help organize these pipelines efficiently. You can simplify tasks by dynamically generating them from a list, making it easier to handle many data tables.