The hottest DevOps Substack posts right now

And their main takeaways

How to OpenClaw your Raspberry Pi

Gradient Ascendant • 16 implied HN points • 23 Feb 26

🕹 Technology DevOps

OpenClaw runs an always-on AI agent with installable "skills" that you can talk to over Slack or Telegram, and putting it on a Raspberry Pi makes the agent cheap, portable, and able to write and deploy software for you.
Getting a Raspberry Pi 5 running headlessly is fiddly: you must create a user with an encrypted password on the SD card, enable SSH, and plug the Pi into Ethernet to set the Wi‑Fi country before wireless will work.
These agents can act autonomously and use real credentials to install, commit, and deploy code, so you need separate accounts, limited permissions, and careful attention to security and prompt‑injection risks.

Inventing the Ralph Wiggum Loop | Creator Geoffrey Huntley

Dev Interrupted • 70 implied HN points • 13 Jan 26

🕹 Technology DevOps

The "Ralph" pattern runs a simple loop that feeds a model's own outputs back into it until it produces a correct result, making persistent retries more important than a single perfect model.
Gas Town is an orchestration approach that treats work as tiny, handoffable tasks executed by many ephemeral agents, creating an assembly line where coordination is the main bottleneck.
AI scraping documentation can destroy traffic-driven revenue for open source projects, causing layoffs and a sustainability crisis, so supporting the open source you depend on is increasingly crucial.

What developers can do to combat the growing threat of rogue AI agents

Dev Interrupted • 32 implied HN points • 05 Feb 26

🕹 Technology DevOps

AI agents can go rogue by repeatedly or unpredictably calling APIs, chaining actions, or accessing data outside their intent, so permissive or poorly scoped endpoints become big operational risks.
Treat agents as first-class API consumers: use clear, spec-driven contracts, structured schemas, and least-privilege identities with short-lived tokens so agent behavior is predictable and easy to revoke.
Practical guardrails like rate limits, schema validation, anomaly detection, and strong observability are essential to spot and contain misbehavior, and keep deterministic systems separate from agentic workflows to reduce risk.

Issue #115

Infra Weekly Newsletter • 22 implied HN points • 12 Feb 26

🕹 Technology DevOps

Agents need durable, versioned, replayable state so their behavior can be debugged, audited, and trusted in production; self-hosted state engines provide strong consistency and memory for that use case.
Data infrastructure, not models, will be the real competitive advantage for agent-driven systems because agents create lots of tiny, ephemeral databases and demand fast, reusable access; winning databases will virtualize many logical tenants on shared infra, separate compute and storage, and shift pricing to usage-based models.
Counting CVEs or relying only on CVSS is a shaky security strategy because both are noisy and lack context; build AppSec around threat modeling and contextual triage, and treat zero-CVE claims with skepticism since upstream timelines and metadata can hide real risk.

The Case for Ephemeral Resolvers: Securing the Global Namespace

Phoenix Substack • 56 implied HN points • 09 Jan 26

🕹 Technology DevOps

Make DNS resolvers ephemeral so attackers have at most a short window to exploit them; rotating instances every ~15 minutes evicts compromises before they can be weaponized.
Leverage PowerDNS’s modular stack—dnsdist as a stable front, database-backed authoritative servers, and shared-memory for recursive state—to rotate backend workers quickly without cache cold-starts.
At scale this model adds minimal overhead (under 2% CPU) and changes security from reactive patching to proactive eviction, greatly raising the cost and shortening the lifespan of zero-day attacks.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

What Meta’s diffs per developer metric revealed about engineering at scale

Dev Interrupted • 42 implied HN points • 15 Jan 26

🕹 Technology DevOps

Single-number productivity metrics (like diffs per developer) can stop reflecting real work when codebases, teams, and constraints grow, because a small change today can be a much heavier unit than it was before.
When a metric becomes a target, people naturally optimize the metric instead of value, favoring safe, visible motion over hard, high-leverage work.
Leaders should treat simple metrics as clues not verdicts: investigate flow, risk, and impact, and change what you measure and reward so teams focus on real product and business outcomes.

Analyst reactions: How AI is reshaping engineering organizations

Engineering Enablement • 11 implied HN points • 18 Feb 26

🕹 Technology DevOps

Hiring is shifting toward AI‑fluent roles like “AI Engineer,” and companies are putting much more emphasis on code quality because AI makes writing code easier but often produces sloppy output that reviewers must catch.
Early, fragmented AI experiments are being centralized into platform-level models (AI Centers of Excellence or hub-and-spoke), so platform teams now own governance, orchestration, and making AI a standard developer tool.
A new operational layer—LLMOps—is emerging to run models, ship integrations, and create reusable prompts, while human challenges like security training, unclear ROI, and uncontrolled developer experimentation remain the biggest risks.

Burning the Haystack

Phoenix Substack • 28 implied HN points • 26 Jan 26

🕹 Technology DevOps

Orchestration is the real security — treating the AI stack as a single system with explicit startup ordering and topology awareness prevents fragile, exposed deployments. Tools that give Kubernetes a brain (like Grove) let you define architectural intent so the system behaves safely by design.
Continuous rotation and ephemerality stop attackers from persisting — automatically refreshing containers, nodes, and resources prevents intruders from gaining a foothold. Baking moving-target defenses into the pod lifecycle makes security preemptive instead of reactive.
DevOps-driven orchestration beats static security teams — teams that control the orchestrator can kill and respawn infrastructure faster than traditional patch-and-report workflows, rendering many vulnerabilities irrelevant. Security becomes an operational side effect when rotation and orchestration are part of normal scaling and deployment.

Skills: The Missing Piece in AI Security Tooling

Boring AppSec • 23 implied HN points • 23 Jan 26

🕹 Technology DevOps

Generic threat modeling tools miss risks unique to multi‑agent AI systems, so one‑size‑fits‑all methods like STRIDE are insufficient.
Skills are modular, LLM‑native knowledge packages that let agents detect agentic patterns and find context‑specific threats (like cascade failures and goal hijacking) that generic rules miss.
Skills are portable and quick to create and share, so teams can build reusable, relevant expertise that yields better findings than lots of generic noise.

Setting up NextJS with GitHub Actions and feature flags using Flagsmith

The Open Source Expert • 59 implied HN points • 05 Jul 24

🕹 Technology DevOps

Using NextJS helps streamline your project with standardized setups, making it easier to onboard and rapidly develop features.
Automating tasks with GitHub Actions can save time and reduce errors, giving you quick feedback on your code changes.
Feature flags from Flagsmith allow you to control which features are visible without needing to redeploy your app, making it easier to manage updates and A/B tests.

Issue #117

Infra Weekly Newsletter • 4 implied HN points • 03 Mar 26

🕹 Technology DevOps

OS‑level and toolchain dependencies are often left unmanaged, so CI becomes the only place the full environment reliably exists and developers end up in a commit→push→wait debugging loop.
Tooling sits on a spectrum: asdf/mise pin runtime CLIs, Devbox gives a consistent per‑project shell, and Nix provides declarative, reproducible builds — treating the environment as a first‑class artifact makes local‑first, reproducible pipelines practical.
YAML+embedded shell turns pipelines into untestable code, so keep build/test logic in locally runnable artifacts (Nix/Devbox) and reserve YAML for orchestration, permissions, and deployment policy.

OpenAI Codex 101: The Complete Guide to AI Coding Agents with 33 Ready-to-Use Prompts

The Product Channel By Sid Saladi • 6 implied HN points • 25 Feb 26

🕹 Technology DevOps

Codex is an autonomous coding agent that can write, test, debug, refactor, and open pull requests, letting you delegate mechanical development work and speed up delivery.
Effective use requires project tooling like AGENTS.md, reusable Skills, automations, and multi-agent worktrees across web, CLI, app, or IDE surfaces to keep work consistent and isolated.
Choose tools by workflow: use Codex for fast, parallel delegation, scheduled automations, and GitHub-native reviews, use a reasoning-first agent for deep debugging, privacy, or huge context — or combine both for best results.

Day in the Life: Building a Prototype with My AI Agent

Boring AppSec • 7 implied HN points • 13 Feb 26

🕹 Technology DevOps

Defense in depth and human-in-the-loop gates really matter. Layered controls—allowlists, sandboxed subagents, firewalls, Tailscale, and ephemeral VMs—stopped an agent from autonomously exposing services and required manual approval where needed.
Tool policy enforcement beats plain filesystem isolation. A sandbox that restricts actions like exec/gateway/message is safer than a VM-only approach, and the ideal is VM-aware sandboxes that enforce tool policies inside ephemeral VMs.
The main unsandboxed agent, secrets, and prompt injection are the biggest risks. Use least privilege, just-in-time secrets injection, exposure audit logs, and require explicit user approval for network exposure to mitigate them.

SAI Notes #04: CI/CD for Machine Learning.

SwirlAI Newsletter • 511 implied HN points • 28 May 23

🕹 Technology DevOps

In Machine Learning projects, CI/CD processes need to treat the ML training pipeline separately from regular software pipelines.
Efficient MLOps implementation requires an organizational structure where ML product development flows within a single end-to-end ML team.
ML systems in mature MLOps setups involve ML teams building and delivering pipelines that expose predictions to end users through backend and frontend services.

How Capital One supports 14,000 technologists with one pipeline | Ameesh Paleja

Dev Interrupted • 28 implied HN points • 06 Jan 26

🕹 Technology DevOps

Standardizing build and deployment pipelines and automating SRE tasks removes repetitive work so large engineering teams can move like startups and focus on high‑value problems.
AI in 2026 shifts from demos to real procurement: organizations will budget heavily for AI and should prioritize applying models to new workflows while enforcing strong security and governance.
Pausing deploys (like Friday freezes) often increases risk by accumulating untested changes; regular, practiced deployments build resilience and reduce surprise failures.

Issue #116

Infra Weekly Newsletter • 4 implied HN points • 26 Feb 26

🕹 Technology DevOps

Openclaw is a must-see demo that hints at a revolutionary capability, but it also raises serious security and safety concerns that need urgent attention.
Trying to build services "Made in EU" is harder than it sounds because app distribution and common logins still tie you to US platforms, but there are many affordable EU hosters, auth and mail providers and de-Googled options like Sailfish OS that help keep data in Europe and support technical sovereignty.
NixOS offers strong reproducibility, atomic updates and rollbacks for infrastructure, so creating Kubernetes inside VMs with imperative tools like kubeadm can undercut that declarative approach; using Nix to manage clusters is educational but the tooling choices matter for true reproducibility.

Last Call for Quality

QUALITY BOSS • 39 implied HN points • 03 Jul 24

🕹 Technology DevOps

Testing software too late can lead to more expensive and difficult fixes. It's better to catch bugs earlier in the development process.
Many teams rely too much on manual testing, which can slow things down. A mix of automated and manual testing can improve quality and efficiency.
Ignoring non-functional requirements like security and performance can make software unsatisfactory, even if it meets basic needs. It's important to include these factors in testing plans.

The Tech Buffet #20: How To deploy a Cloud Function That Summarizes Youtube Videos

The Tech Buffet • 139 implied HN points • 11 Mar 24

🕹 Technology DevOps

Cloud Functions are a serverless way to run your code on Google Cloud without managing servers. You pay only for what you use, making it cost-effective.
You can build a Cloud Function to summarize YouTube videos by extracting their transcripts and using AI to create concise summaries. This is done using Python libraries like youtube-transcript-api and langchain.
Testing your Cloud Function locally is a great way to ensure it works before deploying it. You can use tools like Postman to check the API responses easily.

Backstage’s journey from spreadsheets to global IDP standard | Spotify’s Tyson Singer

Dev Interrupted • 14 implied HN points • 20 Jan 26

🕹 Technology DevOps

Backstage evolved from spreadsheets into a company-wide developer portal (Portal) that uses golden paths and an AI Knowledge Assistant to scale support and cut internal tickets nearly in half.
New agentic AI tools like Cowork, Gas Town, and Loom are moving AI from giving advice to doing work autonomously, which creates a need for complex orchestration and tiny task decomposition.
The engineer role is shifting from solo coder to conductor of digital workers, so raw output metrics (like diffs per developer) can mislead and teams should focus on judgment, system design, and sustainable processes.

NIST's "Strategies for Integration of Software Supply Chain Security in DevSecOps CI/CD Pipelines"

Resilient Cyber • 159 implied HN points • 13 Feb 24

🕹 Technology DevOps

Software supply chain attacks are on the rise, so companies need to protect their processes from potential risks. Understanding these threats is key for organizations that rely on software.
NIST provides guidelines to help organizations improve their software security in DevSecOps environments. By following their advice, companies can ensure that their software development processes are safe from compromise.
Implementing zero-trust principles and automating security checks during software development can greatly reduce the risk of attacks. This means controlling access and regularly checking for vulnerabilities throughout the development cycle.

My Claude Code workflow

Aliveness Studies • 13 implied HN points • 12 Jan 26

🕹 Technology DevOps

Pay for the Max plan and run multiple model instances so you have enough usage and can parallelize feature work and background tasks.
Use git worktrees (and a helper like worktrunk) plus plan-mode workflows to manage branches, run hooks, spin up per-branch dev servers, and have the model draft and implement features with tests and linting.
Automate end-to-end: let the model ‘do it for me’ to run CLI tools, deploy, update DNS, run headless integration tests, and use browser or interview tools to gather info and fix problems without manual steps.

The Tech Buffet #21: Deploy A Production-Ready Streamlit App with Cloud Run and Cloud Build

The Tech Buffet • 99 implied HN points • 22 Mar 24

🕹 Technology DevOps

Cloud Run lets you deploy containerized applications without worrying about server management. You only pay when your code is actively running, making it a cost-effective option.
Using Pulumi as an Infrastructure as Code tool simplifies the process of setting up and managing cloud resources. It allows you to deploy applications by writing code instead of manually configuring settings.
Automating your deployment with Cloud Build ensures your app updates easily whenever you make code changes. This saves time and effort compared to manually deploying each time.

The Sequence AI of the Week #698: How E2B Powers Safe AI Sandboxes

TheSequence • 126 implied HN points • 06 Aug 25

🕹 Technology DevOps

E2B is an open-source platform that helps run AI code safely in small, isolated environments called microVMs. This makes it easier for developers to test and use AI without worrying about security risks.
The platform combines new technologies like Kubernetes and Terraform to allow easy scaling and management of AI tasks. This means it can quickly adjust to handle more work as needed.
E2B also has tools to simplify the developer's workflow, letting them focus on creating cool AI applications rather than spending time on setup and management.

VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus

VTEX’s Tech Blog • 99 implied HN points • 10 Mar 24

🕹 Technology DevOps

VTEX successfully scaled its monitoring system to handle 150 million metrics using Amazon's Managed Service for Prometheus. This helped them keep track of their numerous services efficiently.
By adopting this system, VTEX cut its observability expenses by about 41%. This shows that smart choices in technology can save money.
The new architecture allows VTEX to respond to problems faster and reduces the chances of system failures. It increased the reliability of their metrics, making everyday operations smoother.

Software Supply Chain Security in DevSecOps & CI/CD

Resilient Cyber • 259 implied HN points • 27 Sep 23

🕹 Technology DevOps

Software supply chain attacks are increasing, making it essential for organizations to protect their software development processes. Companies are looking for ways to secure their software from these attacks.
NIST has issued guidance to help organizations improve software supply chain security, especially in DevSecOps and CI/CD environments. Following NIST's recommendations can help mitigate risks and ensure safer software delivery.
The complexity of modern software environments makes security challenging. It's important for organizations to implement strict security measures throughout the development lifecycle to prevent attacks and ensure the integrity of their software.

☁️ OpenClaw Set Up on AWS for Free (The Honest Guide — From Someone Who Hit Every Wall)

The Product Channel By Sid Saladi • 3 implied HN points • 24 Feb 26

🕹 Technology DevOps

You can run OpenClaw on AWS free tier by launching an EC2 Ubuntu instance, creating a key pair, opening SSH to your IP, and using ~30 GB storage, but you still pay for any LLM API usage.
The t3.micro free tier (1 GB RAM) often crashes during OpenClaw’s onboarding, so upgrading to t3.small (2 GB) is the practical fix to avoid JavaScript heap out of memory errors.
If you change instance type be sure to stop the instance first, apply the new type, restart it, and note your public IP will change; pick a nearby region and restrict SSH to your IP for security.

Learning from the Best

Permit.io’s Substack • 79 implied HN points • 14 Mar 24

🕹 Technology DevOps

Learning from bigger companies can help solve problems effectively. They often share their insights which can be adapted to smaller projects.
Not reinventing the wheel is smart. Using existing solutions like policy engines can save time and effort while ensuring reliability.
Engaging with the community and resources available online can provide valuable knowledge and support for developers looking to improve their work.

DevEx: Better than an ExDev (And your Ex)

Permit.io’s Substack • 19 implied HN points • 04 Jul 24

🕹 Technology DevOps

Developer experience (DevEx) is really important because it helps developers focus on building great apps while also handling security tasks more smoothly.
It's crucial to make security features easy to use so that everyone involved, from developers to non-technical users, can manage permissions and access without problems.
A successful approach to DevEx considers the whole development process, ensuring security practices are integrated naturally into workflows from start to finish.

Defending CI/CD Environments - The NSA/CISA Way

Resilient Cyber • 299 implied HN points • 29 Jun 23

🕹 Technology DevOps

CI/CD environments are crucial for the development and delivery of software, but they can also be targeted by hackers. It's important to secure these systems to prevent attacks.
The NSA and CISA have released guidelines that offer best practices for protecting CI/CD pipelines. Using existing frameworks and tools can help improve security effectively.
Transitioning to a Zero Trust model is recommended to enhance security in software development. This approach minimizes risks by ensuring that all access is restricted and monitored.

Everything New Has Bugs

Fish Food for Thought • 23 implied HN points • 03 Dec 25

🕹 Technology DevOps

When you speed up releases or adopt new systems, bugs and incidents will usually rise at first — it’s a natural tradeoff between velocity and stability.
Give teams slack and real ownership so they can fix problems, learn, and improve quality instead of just reacting to fires.
Invest in supporting systems and feedback loops like CI/CD, observability, error budgets, and postmortems so you can absorb turbulence and restore quality faster.

The Measurement Problem in Software Engineering

Maestro's Musings • 17 implied HN points • 15 Dec 25

🕹 Technology DevOps

Counting artifacts like lines of code, story points, or PR counts has repeatedly failed; these proxies miss real value, are easy to game, and can harm organizations.
AI both breaks traditional metrics—making code volume meaningless and often increasing churn and bugs—and widens perception gaps where developers feel faster than measured results show.
A promising path is semantic, context-aware measurement that uses AI to understand what changes actually do and synthesize those findings into simple narratives for leaders, aiming for "good enough" insight that’s harder to game.

🧠 Knowledge Series #21: What is serverless?

Department of Product • 98 implied HN points • 23 Jan 24

🕹 Technology DevOps

Serverless does not mean no servers; it means not managing them.
Web servers host websites and deliver web pages to users over the internet.
Serverless technology is about shifting server management responsibility.

Workflows as Abstractions to Capabilities

The API Changelog • 4 implied HN points • 30 Jan 26

🕹 Technology DevOps

Baking API integrations into code creates maintenance hell because the more services you add, the higher the chance a change will break something and make troubleshooting hard.
Map integrations to business capabilities (like “sale close”) instead of raw API operations so it’s easier to diagnose failures, reduce complexity, and swap vendors without breaking business flows.
Implement those capabilities as visual workflows with low-code/no-code tools so teams can see, manage, assign, and lifecycle-manage integrations, making fixes and outsourcing simpler.

Understanding Konfig's Opinionation

realkinetic • 19 implied HN points • 11 Jun 24

🕹 Technology DevOps

Konfig is an opinionated platform that reduces the investment and total cost of ownership needed for an enterprise cloud platform and speeds up the delivery of new software products.
Konfig promotes a structured platform with a focus on service-oriented architecture and domain-driven design, encouraging decoupling services and promoting durable teams.
The platform enforces group-based access management, uses GitOps for infrastructure management, leverages managed services and serverless offerings, and provides an escape hatch for flexibility outside of its opinions.

My Love, Hate Relationship.

Data Engineering Central • 137 implied HN points • 24 Jul 23

🕹 Technology DevOps

Data Engineers may have a love-hate relationship with AWS Lambdas due to their versatility but occasional limitations.
AWS Lambdas are under-utilized in Data Engineering but offer benefits like cheap solutions, ease of use, and driving better practices.
AWS Lambdas are handy for processing small datasets, running data quality checks, and executing quick logic while reducing architecture complexity and cost.

Tech Debt vs Quality

Fish Food for Thought • 14 implied HN points • 10 Dec 25

🕹 Technology DevOps

Tech debt and bugs are different: bugs are immediate errors to fix, while tech debt is the future cost of taking shortcuts and can be intentional or accidental, so decide and plan when to incur it.
Make debt visible and economic: track where it slows work, measure the "interest" it charges in developer time or incidents, and prioritize paying down high-interest items rather than treating all debt equally.
Leadership and culture matter: embed maintenance into planning, keep slack for cleanup, use retrospectives and metrics to shorten recovery time, and design continuous improvement cycles so velocity and quality compound over time.

Issue #113

Infra Weekly Newsletter • 13 implied HN points • 09 Dec 25

🕹 Technology DevOps

Ingress NGINX is being retired in favor of the Gateway API, so teams should plan and follow migration steps to switch to API Gateway.
Infrastructure-as-Code best practices emphasize modular design, testing, and isolating dependencies; they also recommend safe update patterns like blue‑green deployments, cross-team collaboration, and secure, scalable provisioning.
Linux 6.18 is the new LTS kernel and distributions like Alpine 3.23 are adopting it quickly, so operators should plan OS/kernel upgrades and test their stacks against this LTS.

The one where we vibe code holiday cards | Season 5 Finale

Dev Interrupted • 9 implied HN points • 23 Dec 25

🕹 Technology DevOps

MCP agents need strong safeguards: treat actions on a spectrum of reversibility and consequence, and require a human in the loop for irreversible or high‑risk operations.
Engineers are still responsible for delivering proven code, not just generating it — every line of AI‑produced code must be verified and tested before shipping.
Rigid engineering dogmas like mandatory review for every PR and slavish sprint rituals slow teams down. Teams should let senior engineers self‑merge low‑risk changes and audit whether safeguards prevent bugs or just block work.

From Kubernetes to AI maximalism | Stacklok's Craig McLuckie

Dev Interrupted • 14 implied HN points • 25 Nov 25

🕹 Technology DevOps

Treat AI like engineering — insist on reproducibility, audit trails, and measurable quality so models aren’t just probabilistic parrots.
Use AI to amplify good habits, not hide gaps — have models critique your solutions Socratically and keep humans in charge of architecture to avoid accelerating technical debt.
Replace the "glue person" with composable AI workflows and agent-assisted cleanup, and measure adoption and impact so you can reclaim focus and reduce coordination toil.

🗓 Your invitation to DevOps discussions with Tobias Mende and Bryan Finster

🔮 Crafting Tech Teams • 79 implied HN points • 28 Nov 23

🕹 Technology DevOps

This post invites you to DevOps discussions with Tobias Mende and Bryan Finster.
Crafting Tech Teams is a reader-supported publication with scheduled streams on Wednesdays and Thursdays at 4pm CET.
You can keep reading the full post and get 7 days of free access by subscribing to 🔮 Crafting Tech Teams.