Overview
OP
6
Active Incidents
↑ +2 from yesterday
3
Critical Alerts
↑ +1 from yesterday
4
Unhealthy Workloads
↓ -1 from yesterday
4
Remediations Today
3 successful
Incident Trend
Last 24 hours
Severity Distribution
All active incidents
8total
Critical3
High2
Medium2
Low1
Resource Usage
CPU and memory — last 24 hours
Services at Risk
Services with active incidents
| Service | Risk | Incidents |
|---|---|---|
| payment-worker | Critical | 2 |
| auth-service | High | 1 |
| metrics-gateway | Critical | 1 |
| platform-backend | High | 1 |
| cache-proxy | Medium | 1 |
Recent Incidents
View all →Criticalpayment-worker CrashLoopBackOff — OOMKilled repeatedlypayment-worker14m agoHighauth-service high CPU — sustained 94% for 8 minutesauth-service22m agoCriticalmetrics-gateway pod evicted — node memory pressuremetrics-gateway45m agoHighdemo-api readiness probe failing — 5xx responsesdemo-api1h agoCriticalplatform-backend image pull failure — ErrImagePullplatform-backend5m ago
Recent Remediations
View all →Rollout Restart
demo-api
operator
42s
Rollout Restart
metrics-gateway
operator
—
Restart Deployment
payment-worker
operator
3s