Fix MachineConfigPool Updating¶
Introduction¶
MachineConfigPools apply node-level configuration. Updating or Degraded states often mean a node failed to drain, reboot, or apply rendered configuration.
Symptoms¶
Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.
Common Causes¶
- Forcing node changes while the MCP is already degraded.
- Ignoring PodDisruptionBudgets during drains.
- Editing rendered MachineConfig objects directly.
Step 1: Check the Current Status¶
oc get machineconfigpool
oc describe mcp worker
oc get nodes -l node-role.kubernetes.io/worker
oc adm drain worker-1 --ignore-daemonsets --delete-emptydir-data
Example output:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READY UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
worker rendered-worker-9c7d8f5b2a1c4e6f8d0a True False False 3 3 3 0
Step 2: Inspect Logs and Events¶
oc get mcp worker
oc describe mcp worker
oc get nodes
Step 3: Verify Configuration¶
Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.
Step 4: Apply the Fix¶
Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.
Step 5: Confirm the Problem Is Resolved¶
Run the verification commands again and confirm the status, events, and user-facing test all agree.
Common Mistakes¶
- Forcing node changes while the MCP is already degraded.
- Ignoring PodDisruptionBudgets during drains.
- Editing rendered MachineConfig objects directly.
Quick Checklist¶
- Confirm the active project.
- Inspect the exact object named in the error.
- Read recent events.
- Apply one focused fix.
- Verify status after the change.
Related Guides¶
Summary¶
Fix MachineConfigPool Updating requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.