Troubleshoot OpenShift API Server¶
Introduction¶
Control plane troubleshooting should stay evidence-driven. Check ClusterOperators, component pods, recent events, and logs before restarting anything.
Symptoms¶
Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.
Common Causes¶
- Restarting control plane pods without reading the operator message.
- Ignoring certificate or quorum warnings.
- Troubleshooting from a stale kubeconfig context.
Step 1: Check the Current Status¶
oc get clusteroperators
oc get pods -n openshift-etcd
oc logs -n openshift-etcd -l k8s-app=etcd --tail=50
oc get events -n openshift-etcd --sort-by=.lastTimestamp
Example output:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
etcd 4.15.12 True False False 8d Etcd is available.
Step 2: Inspect Logs and Events¶
oc get co etcd kube-apiserver console
oc get pods -n openshift-etcd
oc get events -n openshift-etcd
Step 3: Verify Configuration¶
Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.
Step 4: Apply the Fix¶
Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.
Step 5: Confirm the Problem Is Resolved¶
Run the verification commands again and confirm the status, events, and user-facing test all agree.
Common Mistakes¶
- Restarting control plane pods without reading the operator message.
- Ignoring certificate or quorum warnings.
- Troubleshooting from a stale kubeconfig context.
Quick Checklist¶
- Confirm the active project.
- Inspect the exact object named in the error.
- Read recent events.
- Apply one focused fix.
- Verify status after the change.
Related Guides¶
Summary¶
Troubleshoot OpenShift API Server requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.