Container Platform: Preventing VKS Disk Pressure with an Orchestrated Image Prune Runbook



Disk pressure on Kubernetes worker nodes triggers pod evictions and instability. The KB highlights a common cause: exited containers and unused images accumulating without cleanup, and suggests automation (CronJob/DaemonSet) to prune images.
Source KB: https://knowledge.broadcom.com/external/article/391806/vks-worker-nodes-showing-kubelethasdisk.html
The narrow use case
When DiskPressure is detected on any VKS worker node, run a controlled prune action across the cluster (with safety checks).
Orchestrator action: VKS DiskPressure remediation runner
Goal: turn a reactive kubectl firefight into a repeatable, audited runbook that ops can safely execute.
Workflow steps (VMware Aria Orchestrator)
- Create a workflow: 'VKS - DiskPressure Remediation (Image Prune)'
- Inputs: kubeconfigSecret (secure), namespace (string, optional), nodeSelector (string, optional)
- Step 1: Query node conditions (kubectl describe node / API) and identify nodes where DiskPressure=True.
- Step 2: If no nodes are impacted, exit PASS with 'No remediation required'.
- Step 3: Apply a predefined DaemonSet/CronJob manifest that prunes unused images on each node (bounded runtime).
- Step 4: Re-check DiskPressure state and report nodes that remain constrained.
Action steps
- Store the prune manifest in Git and have Orchestrator apply it from a known, versioned source.
- Make the workflow require approval when production namespaces are targeted.
- Schedule a weekly run in low-traffic windows if your environment frequently accumulates unused images.



