Troubleshoot OpenShift Monitoring¶

Introduction¶

OpenShift monitoring runs platform Prometheus, Alertmanager, and related components. Start with pod health, then Alertmanager alerts, targets, and storage pressure.

Symptoms¶

Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.

Common Causes¶

Deleting monitoring PVCs to clear space.
Ignoring persistent volume pressure.
Assuming user workload monitoring is enabled by default in every cluster.

Step 1: Check the Current Status¶

oc get pods -n openshift-monitoring

oc get routes -n openshift-monitoring

oc adm top pods -n openshift-monitoring

oc logs statefulset/prometheus-k8s -n openshift-monitoring --tail=50

Example output:

NAME                                  READY   STATUS    RESTARTS   AGE
prometheus-k8s-0                      6/6     Running   0          5d
alertmanager-main-0                   6/6     Running   0          5d

Step 2: Inspect Logs and Events¶

oc get pods -n openshift-monitoring

oc get pvc -n openshift-monitoring

oc get events -n openshift-monitoring

Step 3: Verify Configuration¶

Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.

Step 4: Apply the Fix¶

Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.

Step 5: Confirm the Problem Is Resolved¶

Run the verification commands again and confirm the status, events, and user-facing test all agree.

Common Mistakes¶

Deleting monitoring PVCs to clear space.
Ignoring persistent volume pressure.
Assuming user workload monitoring is enabled by default in every cluster.

Quick Checklist¶

Confirm the active project.
Inspect the exact object named in the error.
Read recent events.
Apply one focused fix.
Verify status after the change.

Summary¶

Troubleshoot OpenShift Monitoring requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.

Troubleshoot OpenShift Monitoring

Troubleshoot OpenShift Monitoring¶

Introduction¶

Symptoms¶

Common Causes¶

Step 1: Check the Current Status¶

Step 2: Inspect Logs and Events¶

Step 3: Verify Configuration¶

Step 4: Apply the Fix¶

Step 5: Confirm the Problem Is Resolved¶

Common Mistakes¶

Quick Checklist¶

Summary¶

Troubleshoot OpenShift Networking

Troubleshoot OpenShift Logs

Troubleshoot OpenShift Monitoring¶

Introduction¶

Symptoms¶

Common Causes¶

Step 1: Check the Current Status¶

Step 2: Inspect Logs and Events¶

Step 3: Verify Configuration¶

Step 4: Apply the Fix¶

Step 5: Confirm the Problem Is Resolved¶

Common Mistakes¶

Quick Checklist¶

Related Guides¶

Summary¶

Troubleshoot OpenShift Networking

Troubleshoot OpenShift Logs

More Openshift

Troubleshoot OpenShift Storage

Troubleshoot OpenShift Networking

Troubleshoot OpenShift Logs