Container Platform: Preventing VKS Disk Pressure with an Orchestrated Image Prune Runbook

The Cloud Architect
The Cloud Architect

Disk pressure on Kubernetes worker nodes triggers pod evictions and instability. The KB highlights a common cause: exited containers and unused images accumulating without cleanup, and suggests automation (CronJob/DaemonSet) to prune images.

Source KB: https://knowledge.broadcom.com/external/article/391806/vks-worker-nodes-showing-kubelethasdisk.html

The narrow use case

When DiskPressure is detected on any VKS worker node, run a controlled prune action across the cluster (with safety checks).

Orchestrator action: VKS DiskPressure remediation runner

Goal: turn a reactive kubectl firefight into a repeatable, audited runbook that ops can safely execute.

Workflow steps (VMware Aria Orchestrator)

  • Create a workflow: 'VKS - DiskPressure Remediation (Image Prune)'
  • Inputs: kubeconfigSecret (secure), namespace (string, optional), nodeSelector (string, optional)
  • Step 1: Query node conditions (kubectl describe node / API) and identify nodes where DiskPressure=True.
  • Step 2: If no nodes are impacted, exit PASS with 'No remediation required'.
  • Step 3: Apply a predefined DaemonSet/CronJob manifest that prunes unused images on each node (bounded runtime).
  • Step 4: Re-check DiskPressure state and report nodes that remain constrained.

Action steps

  1. Store the prune manifest in Git and have Orchestrator apply it from a known, versioned source.
  2. Make the workflow require approval when production namespaces are targeted.
  3. Schedule a weekly run in low-traffic windows if your environment frequently accumulates unused images.

More Articles

Related Content