CloudsArk
Troubleshooting Openshift

OpenShift Troubleshooting Checklist

Learn practical openshift troubleshooting checklist with oc commands, OpenShift manifests, verification steps, common mistakes, and production-focused guidance.

OpenShift Troubleshooting Checklist

Introduction

Use this checklist when an OpenShift issue is not yet isolated to pods, routes, storage, RBAC, builds, or cluster operators. Start broad, then narrow the failure to the object and namespace that owns it.

Symptoms

Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.

Common Causes

  • Starting with random pod restarts before checking cluster operators and events.
  • Troubleshooting the wrong project or cluster context.
  • Ignoring whether the problem is application-scoped or cluster-scoped.

Step 1: Check the Current Status

oc whoami --show-server
oc get clusteroperators
oc get nodes
oc get events -A --sort-by=.lastTimestamp
oc get pods -A --field-selector=status.phase!=Running

Example output:

NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.15.12   True        False         False      8d
ingress          4.15.12   True        False         False      8d
monitoring       4.15.12   True        False         False      8d

Step 2: Inspect Logs and Events

oc get co
oc get nodes
oc get events -A --sort-by=.lastTimestamp
oc adm top nodes

Step 3: Verify Configuration

Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.

Step 4: Apply the Fix

Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.

Step 5: Confirm the Problem Is Resolved

Run the verification commands again and confirm the status, events, and user-facing test all agree.

Common Mistakes

  • Starting with random pod restarts before checking cluster operators and events.
  • Troubleshooting the wrong project or cluster context.
  • Ignoring whether the problem is application-scoped or cluster-scoped.

Quick Checklist

  • Confirm the API server and current user.
  • Check ClusterOperators and nodes.
  • Check events across the affected namespace or cluster.
  • Inspect the exact failing object.
  • Verify after one focused fix.

Summary

OpenShift Troubleshooting Checklist requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.