How to build a major incidents dashboard
It’s easy for major incidents to get lost amid noise, competing priorities, and uncertainty. Alerts spike, tickets flood in, and Slack channels light up before anyone can clearly spot that something big is happening.
This is where operational intelligence saves the day. You don't need more data, you need a single, coherent view to cut through the noise and turn uncertainty into action.
In this tutorial, we’ll build a dashboard that brings together alert activity, incident data, service impact and ownership into a single pane of glass.
Instead of reacting to isolated symptoms, you’ll be able to see the scope, direction, and momentum of an unfolding event.
Data sources to use
Depending on your environment, you can mix and match different systems to build this dashboard. In this example, we’ll use:
- ServiceNow to capture major incidents, child tickets, ownership, and impact.
- Azure to surface alert volume and infrastructure signals.
This pairing gives us both sides of the story: the human workflow layer (tickets, assignments, escalation) and the system telemetry layer (alerts, spikes, degradation).
Configuring tiles
We’ll step through the dashboard tile by tile using different elements of the SquaredUp toolkit depending on the question we’re answering.
Each tile is documented as a self-contained guide. You can follow them independently, adapt the logic to your own data sources, or build the full board sequentially.
Individually, each tile answers a specific operational question. Together, they tell the story of a major incident in motion.
Active major incidents
This tile answers the most direct question of "how many major incidents are active right now?".
By filtering for priority 1 incidents that are still active, this block immediately tells you whether you’re in a crisis state. In a calm system, this number is zero. When it isn’t, the rest of the dashboard becomes your war-room.
See creating an active major incident tile for detailed instructions.
Time since last major incident
Resilience isn’t just about handling incidents well. It’s also about how often they occur.
This tile tracks the time elapsed since the last declared major incident. A growing duration suggests stability. A short interval between majors can signal systemic fragility.
See creating a time since last major incident tile for detailed instructions.
Major incident status
Not all major incidents are equal. Some are newly declared. Others are stabilizing. Some are resolved but under observation.
This tile derives a health state from the incident lifecycle and maps it to clear operational signals:
- New: 🔴Error
- In Progress: 🟡Warning
- Resolved / Closed: 🟢Success
See creating a major incident status tile for detailed instructions.
Ticket creation rate
When a major incident unfolds, ticket volume often tells you how fast impact is spreading.
By bucketing ticket creation over time, this chart shows whether the situation is accelerating, plateauing, or stabilizing. A rising slope suggests expanding impact. A flattening curve indicates containment.
Viewed alongside alert volume, it helps correlate technical failure with user-reported disruption.
See creating a ticket creation rate tile for detailed instructions.
Major alert volume
Alerts often precede or amplify major incidents. This tile tracks alert counts over the selected timeframe, highlighting spikes that align with service degradation. By focusing on fired alerts and grouping by hour or minute, you gain a clear view of technical pressure building beneath the surface.
It helps answer "Is this incident isolated, or is the system under broader strain?"
See Creating an major alert volume tile for detailed instructions.
Affected services
Major incidents rarely impact a single component.
By grouping child incidents by business service or configuration item, this tile reveals the blast radius. Is everything concentrated in one platform? Or is the issue cascading across dependencies?
This is where root cause often begins to emerge visually.
See creating an affected services tile for detailed instructions.
Assignment group load
Incidents don’t just impact systems. They impact people.
This tile shows how tickets are distributed across assignment groups during an active major. A tightly contained issue might sit with one team. A spreading incident often spans multiple groups, indicating escalation and cross-functional coordination.
It provides visibility into operational load and highlights where support pressure is concentrated.
See creating an assignment group load tile for detailed instructions.
Next steps
Well done! You’ve built a working major incidents dashboard, but now the real value lies in how you use it.
Watch how alert spikes precede ticket growth. Notice how service concentration shifts as an incident spreads. Track how assignment load redistributes during escalation. Use time-since metrics to measure resilience and recovery trends over weeks and months.
When your team can glance at a dashboard and instantly grasp the scale, scope, and momentum of an incident, you’ve moved beyond fragmented monitoring.
You’ve built operational intelligence.