The hottest Data Management Substack posts right now

And their main takeaways

HCF EP 007: Prototyping with imported data

Hasen Judi • 35 implied HN points • 17 Jan 25

The project aims to develop a conversation view that displays threaded replies in a linear format, improving user experience compared to platforms like Twitter or Reddit.
A data model is proposed to track parent-child relationships between posts and replies, allowing for efficient retrieval of both ancestors and descendants of a post.
The author emphasizes using the same 'Post' type across different system layers, arguing that this reduces code complexity and increases productivity compared to using separate representations for each layer.

Creating Your First Change Request with Sort

Database Engineering by Sort • 7 implied HN points • 18 Dec 24

🕹 Technology Software Databases Workflows Engineering Data Management

Sort helps you manage database changes easily and safely, like how GitHub handles changes. You can propose changes without altering the data right away.
Creating a Change Request is simple. Just suggest what you want to change and set it up for review by others in your organization.
Once a Change Request is approved, it can be applied without hassle. If anything goes wrong during the process, Sort can automatically roll back the changes.

How to Know When Data Retention Values Have Changed for Microsoft Sentinel

Rod’s Blog • 138 implied HN points • 03 Aug 23

🕹 Technology Data Management Cybersecurity

Customers can use a quick KQL query to track changes in Log Analytics workspace data retention values for Microsoft Sentinel.
The provided KQL query can be utilized in various ways such as in a Workbook, a Hunting query, or as an Analytics Rule for notifications.
For ongoing access to the latest version of the query and further discussion, references to the author's resources and accounts are provided.

ChatGPT4 still leads ChatBot/LLM Leaderboard

MLOps Newsletter • 137 implied HN points • 16 Jul 23

🕹 Technology AI Programming Machine Learning Data Management Online Learning

ChatGPT4 is leading the ChatBot/LLM Leaderboard
State of GPT series models evolution discussed
Introduction of LeanDojo for open-source Lean playground

Eventual Business Consistency

Software Design: Tidy First? • 134 HN points • 04 Aug 23

🕹 Technology Business Data Management Programming Systems Database

The goal is to achieve eventual business consistency by closely matching what's in the system with real-world events.
Different data storage methods like storing dated data or double-dated data come with trade-offs in complexity and accuracy.
Bi-temporal systems use two dates to track when data changes occurred in reality and when they were recorded in the system for better business operations.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

The Thrill of Deprecating Dashboards

Sarah's Newsletter • 239 implied HN points • 29 Nov 22

🕹 Technology Analytics Automation Data Management BI Tools Data Visualization

Having an excessive number of dashboards can lead to inefficiency and confusion within an organization. It's important to prioritize strategic organization over creating new dashboards indiscriminately.
Developing an automated dashboard deprecation strategy can help save time and maintain a clean BI instance. By automating the process, organizations can efficiently manage and delete unused visuals.
Implementing a proactive maintenance plan, such as using a data catalog or automated tools, can help keep BI instances organized and optimal for data insights. Regular cleaning and organization are key to ensuring the effectiveness of analytics strategies.

3 steps to value focused data product management

Datent • 58 implied HN points • 09 Feb 24

💼 Business Data Management Product Development Project management Innovation

Transitioning from a BI role to a data product team requires defining a Value Gateway to ensure projects deliver tangible benefits.
To manage the progress and accountability of data work, reporting on value at key points is crucial, showcasing the value realized and areas needing support.
Establishing a process around failing fast and doubling down on successful projects, supported by agile project management, is essential for efficient data product management.

Platform vs. DevEx teams: What’s the difference?

Engineering Enablement • 14 implied HN points • 05 Nov 24

🕹 Technology Software Development Engineering Data Management Team Dynamics Productivity Tools

Platform teams handle a broader range of responsibilities compared to Developer Experience teams. This means they are involved in more of the underlying tech operations.
Local development, source code management, and incident management are key tasks for both types of teams. These areas help developers write and deploy their code more smoothly.
The name of the team can reflect its focus. Some teams prioritize overall developer support while others are more infrastructure-focused, suggesting that their approach can change based on company needs.

How to Deploy Microsoft Sentinel Effectively

Rod’s Blog • 59 implied HN points • 01 Feb 24

🕹 Technology Security Cloud Computing Data Management AI Automation

To get the most out of Microsoft Sentinel, organizations should carefully plan and prepare their deployment by assessing security needs and goals.
Choosing the right subscription and pricing model is crucial for optimizing the benefits of Microsoft Sentinel, based on data requirements, user protection, and features needed.
Effective management of Microsoft Sentinel involves monitoring data ingestion, leveraging AI and ML capabilities, automating workflows, and learning from security incidents and feedback.

Virtual Private Cloud (VPC)

Vasu’s Newsletter • 13 implied HN points • 25 Oct 24

🕹 Technology Cloud Computing Networking Security Infrastructure Data Management

A Virtual Private Cloud (VPC) helps businesses create a separate and secure online environment to manage their resources. This means they can control who has access to what information.
With a VPC, administrators can set rules to protect incoming and outgoing internet traffic. It's like having a security system for their online resources.
VPCs come with useful features like VPN connections and load balancers, which help improve communication and manage traffic effectively. This can make online services run more smoothly.

Data Governance in AI

Rod’s Blog • 39 implied HN points • 05 Mar 24

🕹 Technology AI Data Governance Artificial Intelligence Data Management

Data governance in AI ensures that data used by AI systems is governed and managed securely.
Without strong data governance, organizations risk using inaccurate or biased data in their AI systems, leading to flawed outcomes and potential harm.
Data governance in AI is crucial to ensure data accuracy, reliability, and freedom from biases or errors.

Confession: You’re Already in My AI-Powered CRM

Alex Furmansky - Magnetic Growth • 98 implied HN points • 02 May 23

🕹 Technology AI CRM App Development Data Management Automation

The author uses an AI-powered CRM to manage important contacts
They found a solution with OneCircle AI to update the CRM effortlessly
OneCircle AI has the potential to be helpful for businesses as well

Erika Update #12: Traction is the key to curiosity-driven research

Erika’s Newsletter • 98 implied HN points • 22 Aug 23

🔬 Science Research Management Tools Experimentation Data Management

Having traction in research involves a question or problem you're interested in and tools providing new information.
Traction is about balancing where you want to go with how fast you're progressing.
When you have traction in a project, each new experiment sparks multiple next steps to move forward.

In two days: Incremental Documentation for Your Database, Wednesday, Feb 7, 2024 19:00 CET

Minimal Modeling • 101 implied HN points • 05 Feb 24

🕹 Technology Data Management Documentation Modeling Collaboration

Event on Wednesday, February 7, 2024, 19:00 CET about Incremental Documentation for Databases
Minimal Modeling approach focuses on lightweight tabular format for data catalog
Benefits include reduced onboarding time, better communication, and cost savings

The Server Side of Over-the-Air Updates

burkhardstubert • 39 implied HN points • 19 Feb 24

🕹 Technology Software Networking Embedded Systems Updates Data Management

Over-the-Air (OTA) updates can be done in full, delta, or partial ways. Full updates ensure everything is consistent, but they are larger files and take longer to download.
Delta updates save time and bandwidth by only updating the changed parts of a file. They are good for devices with slow internet connections but require a read-only setup.
Staged rollouts keep updates safe by first sending them to a small group of devices. This way, if there are issues, they can be fixed before affecting everyone.

Announcing the Zapier App for Sort: Automate Your Data Workflows!

Database Engineering by Sort • 7 implied HN points • 20 Nov 24

🕹 Technology Automation Software Data Management Integration Web Apps

Sort is a platform that helps manage and change data easily without much hassle. It makes sure your database is accurate and up to date.
With the new Zapier app, you can connect Sort to many other applications to automate tasks. This saves a lot of time and reduces errors since you don't have to do everything manually.
Setting up automations is simple and requires no coding skills. You can start using it right away to improve your workflows.

GroupBy #33: Data Gateway - A Platform for Growing and Protecting the Data Tier at Netflix, The Cloud Storage Triad: Latency, Cost, Durability

VuTrinh. • 19 implied HN points • 30 Apr 24

🕹 Technology Data Engineering Cloud Computing Software Development Infrastructure Data Management

Netflix has created a platform called Data Gateway that helps their developers manage data more easily. It simplifies complex database processes so that app developers can focus on coding.
The cloud storage triad talks about balancing latency, cost, and durability when storing data. Choosing the right storage solution can save money while ensuring data is always available.
Managing data ingestion effectively is crucial for companies like RevenueCat. They faced challenges moving their data and found ways to optimize the process for better performance.

Microsoft Sentinel SOC 101: How to Detect and Mitigate Multiple Microsoft Teams Deleted by a Single User with Microsoft Sentinel

Rod’s Blog • 39 implied HN points • 07 Feb 24

🕹 Technology Security Software Data Management Incident Response Cloud Computing

Use Microsoft Sentinel to detect and respond to multiple Teams deletion events in your organization.
Collect Teams activity logs in Microsoft Sentinel to monitor data and detect security risks.
Write custom analytics rules in Microsoft Sentinel to generate alerts for suspicious activities, such as multiple Teams deletion by a single user.

How to be Notified When Microsoft Sentinel Data Stops Populating

Rod’s Blog • 79 implied HN points • 02 Oct 23

🕹 Technology Data Management Security Monitoring Analytics

Being notified when data ingestion stops is crucial for security analysts to maintain the integrity of security tools.
A KQL query can be set up as an Analytics Rule to alert if a specific table has not received new data within a set timeframe, allowing for timely action.
Email alerts can be configured instead of generating unnecessary security incidents, ensuring the operations team can address potential issues efficiently.

Why Discord ditched Cassandra [System Design Sundays]

Technology Made Simple • 79 implied HN points • 03 Apr 23

🕹 Technology System Design Databases Data Management Programming Tech Education

Discord faced performance issues with Cassandra, requiring increasing maintenance effort and leading to unpredictable latency.
Hot partitions were a problem in Cassandra, causing hotspotting and impacting the database's performance during concurrent reads.
Garbage collection in Cassandra posed challenges, leading Discord to switch to ScyllaDB which does not have a garbage collector.

July 2023

Product Composition • 78 implied HN points • 21 Jul 23

🕹 Technology AI Startup Data Management Design

Decipad is launching its beta version, focusing on making sense of numbers in a dynamic way
AI in the industry should prioritize collapsing data to enhance clarity and facilitate action-taking
The future of jobs is facing a drastic shift, with issues around productivity, social contracts, asymmetrical compensation, and poor job descriptions

Joe's Nerdy Rants #4

Joe Reis • 78 implied HN points • 10 Jun 23

🕹 Technology AI Data Management Data Quality

Encourage kids and others to interact more in real life, consider alternatives to college, find careers that can't be easily automated, and learn to coexist with AI.
Embrace lifelong learning and be open to change in order to adapt to evolving technologies and industries.
Read up on interesting articles about tech, AI, data, and business topics for insights and inspiration.

💸 The Hidden Cost of Context Switching

ppdispatch • 11 implied HN points • 11 Feb 25

🕹 Technology Software Development AI Applications Productivity Tools Data Management Legal issues

Frequent interruptions, even from short messages, can hurt developers' productivity a lot. It can take over 20 minutes to refocus after just one distraction.
A small update to the Linux kernel can really boost data center efficiency, potentially cutting power use by 30%. This change helps manage network traffic better without needing much setup.
Many math libraries don't follow floating-point standards, leading to rounding errors. This can cause big problems in areas like gaming and machine learning where precision is key.

The Tech Buffet #19: How To Build and Deploy an LLM-Powered App To Chat with PapersWithCode

The Tech Buffet • 39 implied HN points • 03 Feb 24

🕹 Technology Machine Learning Software Development Web applications Cloud Computing Data Management

You can build a personal assistant to easily find and understand the latest machine learning research. This assistant will let you ask questions in simple language.
The app uses a system that retrieves and generates information, utilizing a database and machine learning models. It processes data from a site called 'Papers With Code'.
The guide provides step-by-step instructions on how to create, index, and deploy this assistant as a web application, including ready-to-use source code.

A Song of Junk and Value

davidj.substack • 95 implied HN points • 03 Jan 24

🕹 Technology Data Analytics Data Management Data Visualization Standardization Artificial Intelligence

Data dashboards can become like old, unused bookmarks, cluttering up space.
Having standard data models and a semantic layer could lead to a more efficient data analysis experience.
It's important to focus on creating value in data analysis by asking complex questions and optimizing processes.

What is Data Orchestration and why is it misunderstood?

The Orchestra Data Leadership Newsletter • 39 implied HN points • 28 Jan 24

🕹 Technology Data Orchestration Workflow Orchestration Data Management Data Governance

Data orchestration is often confused with workflow orchestration, but it involves more than just triggering and monitoring tasks; it includes reliably and efficiently moving data into production.
Reliably and efficiently releasing data into production is complex and involves elements like data movement, transformation, environment management, role-based access control, and data observability.
Implementing end-to-end and holistic data orchestration offers transformative benefits such as intelligent metadata gathering, data lineage, environment management, data product enablement, and cross-functional collaboration for scalable data operations.

turbopuffer

Why Now • 5 implied HN points • 09 Dec 24

🕹 Technology Software Data Management Infrastructure Cloud Computing Developer Tools

It's important to look for companies that create strong communities or 'religions' around their products. Companies that divide opinion often attract attention and engagement.
Object storage is a powerful way to manage data, allowing for flexible and efficient storage. It uses a flat structure for data organization, making it faster to access compared to traditional file storage.
The separation of storage and compute resources helps businesses scale more effectively. This means you can add storage or processing power independently, making it more efficient for varying demands.

Get a Microsoft 365 Copilot security teardown without hours of documentation review

Deploy Securely • 39 implied HN points • 24 Jan 24

🕹 Technology Security Privacy Data Governance AI Governance Data Management

Microsoft 365 Copilot provides detailed data residency and retention controls favored by enterprises in the Microsoft 365 ecosystem.
Be cautious of insider threats with Copilot as it allows access to considerable organizational data, potentially leading to inadvertent policy violations.
Consider the complexities of Copilot's retention policies, especially in relation to existing settings and the use of Bing for web searches.

How to Monitor the Microsoft Sentinel Trial Period

Rod’s Blog • 59 implied HN points • 07 Nov 23

🕹 Technology Software Microsoft Cloud Computing Data Management

For Microsoft Sentinel customers, a 31-day trial period is available by enabling Microsoft Sentinel on a Log Analytics workspace.
To monitor the trial period, look under the 'News & Guides' blade and access the 'Free Trial' tab to see how many days are left.
In the past, the 31-day trial could be enabled unlimited times on new workspaces, but now it's limited to 20 times per Azure subscription.

We don’t need data contracts

davidj.substack • 71 implied HN points • 16 Feb 24

🕹 Technology Data Management Data Contracts Data Transformation

Data teams face challenges when separated from product engineering, leading to loss of metadata and concerns about data quality. Data contracts can help address these issues by defining the nature, completeness, and format of shared data.
Integrating data professionals within product teams can enhance understanding and usage of data, reducing the need for separate contracts. This approach allows for direct-to-consumer, organic data processes.
Centralized data platform teams can establish common standards and infrastructure, enabling embedded data personnel in product teams to work efficiently. This collaborative model streamlines data transformation and enhances data accessibility.

The Case For An AI Productivity Suite

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 12 Apr 24

🕹 Technology AI Tools Automation Productivity Data Management Business Processes

An AI productivity suite helps people and businesses work more efficiently by combining tools for tasks like data analysis and automation.
It allows users to automate regular tasks, freeing them to focus on more important work, and offers easy customization through no-code options.
These suites also promote teamwork by improving communication and sharing among team members, leading to better project outcomes.

Databricks Unity Catalog: Enabling Data Democratization

Data Plumbers • 19 implied HN points • 08 Apr 24

🕹 Technology Data Management Data Governance

Data democratization is vital for modern data strategies, making data more accessible and understandable within an organization for informed decision-making and better customer experiences.
Databricks Unity Catalog supports data democratization by providing a centralized governance layer, simplifying access management, enabling unified data management, and fostering data discovery, collaboration, and sharing.
Implementing data democratization requires robust data governance and security measures to mitigate risks of privacy violations and data leaks.

Think Like a Detection Engineer, Pt. 1: Logging

Detection at Scale • 199 implied HN points • 18 Jul 22

🕹 Technology Security Data Management Monitoring Infrastructure

Detection Engineers build systems to validate security controls and detect suspicious behaviors with code to protect organizations.
Security data comes from different layers like infrastructure, hosts, networks, applications, and databases, each providing unique context for monitoring.
When collecting logs for security monitoring, consider tradeoffs like the value of data for detection, latency to get data into SIEM, and cost of obtaining and retaining data.

Big AI, Little AI

Dana Blankenhorn: Facing the Future • 59 implied HN points • 17 Nov 23

🕹 Technology AI Automation Data Management

Both Big AI and Little AI can be intimidating, with potential privacy concerns.
Autonomous agents in AI are enhancing customer service by solving problems efficiently.
As AI continues to evolve, teaching critical thinking skills will be crucial for individuals to govern AI effectively.

Building a Working Recommendation Engine from Scratch

Building a Recommendation Engine • 3 HN points • 04 Aug 24

🕹 Technology Algorithms Programming APIs Data Management Machine Learning

A recommendation engine can work without complex machine learning. Instead, it can be built using straightforward connections between content to suggest things users might like.
Using an API from a platform like Are.na allows easy access to user content and helps find connections between different channels, making recommendations more relevant.
It's important to filter out content that users already know or follow to give them fresh and exciting recommendations. Regular updates to the recommendations can also help keep things interesting.

How Artificial Intelligence Will Change the Chemical Industry

The Polymerist • 116 implied HN points • 16 Jan 24

🕹 Technology AI Chemical Industry Software Data Management R&D

Companies in the chemical industry can benefit from AI tools to improve efficiency and profitability.
AI tools are becoming more accessible for functions like customer relationship management, inventory management, and data organization.
While AI won't replace R&D functions, it can significantly enhance productivity and help companies stay competitive in specialized chemical sectors.

#100 - Playing Offense

davidj.substack • 95 implied HN points • 15 Nov 23

💼 Business Data Management Product Development Engineering Efficiency Collaboration

Data quality starts with the Product Requirements Document and Analytics Requirements Document.
For product changes, defining data requirements through a Data Design Document is crucial.
Being part of the product development process improves efficiency, speed, and collaboration in data management.

Must Learn AI Security Part 8: Model Stealing Attacks Against AI

Rod’s Blog • 59 implied HN points • 05 Sep 23

🕹 Technology AI Security Data Management

A Model Stealing attack against AI involves an adversary attempting to steal the machine learning model from a target AI system, potentially leading to security and privacy issues.
Different types of Model Stealing attacks include Query-based attacks, Membership inference attacks, Model inversion attacks, and Trojan attacks.
Model Stealing attacks can result in loss of intellectual property, security and privacy risks, reputation damage, and financial losses for organizations. Mitigation strategies include secure data management, regular system updates, model obfuscation techniques, monitoring for suspicious activity, and implementing multi-factor authentication.

A Quick Way to Verify the Connection Between Microsoft Defender External Attack Surface and Microsoft Sentinel

Rod’s Blog • 59 implied HN points • 13 Jun 23

🕹 Technology Cybersecurity Software Data Management

Check for custom tables starting with 'EASM' to verify connection between Microsoft Defender External Attack Surface and Microsoft Sentinel.
In Microsoft Sentinel, tables will show up in the Custom Logs Solutions area.
Connecting EASM to Microsoft Sentinel involves three steps: setting up EASM, configuring permissions, and enabling the connection.

How to build a dual Incremental + snapshot data ingestion pipeline

Practical Data Engineering Substack • 59 implied HN points • 01 Oct 23

🕹 Technology Data Engineering Data Pipelines Real-Time Processing Data Management

You can improve data accuracy by using two pipelines: one for getting recent updates quickly and another for regularly loading the entire dataset. This helps in keeping the data reliable over time.
It's essential to manage pipeline scheduling based on your business's needs, like how often you need updates. You can choose faster updates or less frequent full reloads depending on how critical the data is.
Using tools like Apache Airflow can help organize these pipelines efficiently. You can simplify tasks by dynamically generating them from a list, making it easier to handle many data tables.