vSAN in VCF 9: Preventing ESXi Reboot Hangs with an Orchestrator Precheck Gate



This post covers a narrow operational risk in vSAN-backed VCF environments: an ESXi host reboot can hang when vSAN CMMDS shutdown is delayed under specific network conditions.
Source KB: https://knowledge.broadcom.com/external/article/405500/esxi-shutdown-hung-at-rebootrunhandlersv.html
What the KB is telling you
If you reboot hosts during maintenance, you need a deterministic gate that prevents reboots when the cluster is not in a safe state (network instability, ongoing storage transitions, etc.).
Orchestrator action: vSAN reboot precheck gate
Goal: block host reboot workflows unless vSAN conditions are healthy.
Workflow steps (VMware Aria Orchestrator)
- Create a workflow: 'VCF9 - vSAN Reboot Precheck Gate'
- Inputs: vcCluster (VC:ClusterComputeResource), maxActiveResync (number, default 0)
- Step 1: Query vSAN/vCenter health for the cluster. If critical issues exist, fail the workflow.
- Step 2: Query resync/rebuild activity. If active resync components > maxActiveResync, fail the workflow.
- Step 3: If checks pass, return PASS and allow downstream 'Enter Maintenance Mode' / 'Reboot Host' workflows to proceed.
Expected outcome
This turns host reboots into a controlled operation: no precheck PASS = no reboot.


