How to build an on-call alert noise dashboard
For many teams, the on-call experience has quietly degraded. Engineers are woken up for alerts that don’t require action, the same services trigger incidents week after week, and no one can confidently answer a simple question: is our on-call setup actually healthy?
What exists instead is an endless stream of alerts, some post-incident metrics, and a growing sense of fatigue. This is where operational intelligence becomes essential.
In this tutorial, we’ll build a dashboard that brings together alert volume, incident data, and response times into a single pane of glass. Rather than focusing on uptime alone, this dashboard looks at the operational reality of being on call.
The goal isn’t just visibility, but action. At the end of this guide, you'll have built a tool that reduces noise, improves readiness, and protects the people behind the screen.
Data sources to use
Depending on how your teams operate and the technologies they use, you can draw from multiple data sources to build this dashboard.
In this example, we’ll use:
- Azure to capture operational signals.
- Jira Service Management to capture the on-call and incident management view.
This combination gives us both system-level telemetry and the human response layer in one place.
Configuring tiles
We’ll build this dashboard one tile at a time, drawing from different elements of the SquaredUp toolkit depending on each use case.
Each tile configuration is documented as a self-contained tutorial. This means you can work through them sequentially to build the full dashboard, or dip into specific tiles as needed and adapt the patterns to your own data sources and use cases.
Alert time distribution
This tile offers a simple but powerful lens, showing how many alerts fire outside working hours versus during the day.
A heavy skew toward evenings and weekends is often the clearest signal of noisy thresholds or low-value alerts. Reducing unnecessary out-of-hours noise is usually the quickest win for improving on-call quality of life without increasing risk.
See Creating an alert time distribution tile for detailed instructions.
Alerts per day
Here we establish a baseline. By tracking total alerts per day over time, you can see the true weight of on-call load.
Rather than focusing on individual alerts, this approach highlights trends in alert volume, making it easy to spot sustained increases, sudden spikes, or periods of relative calm.
See Creating an alerts over time tile for detailed instructions.
Alert volume by service
This tile reveals where the noise lives. By grouping alerts by service, it highlights which systems generate the most interruptions.
Instead of spreading effort thinly, you can focus tuning and investigation where it will make the biggest difference.
See Creating an alert volume by service tile for detailed instructions.
Actionable alerts
Not every alert deserves attention. This tile classifies Azure alert events by severity, creating a practical proxy for actionability. The result is a clearer view of how much of your alert stream genuinely requires human intervention.
See Creating an actionable alerts tile for detailed instructions.
MTTA
These tiles track how quickly incidents are acknowledged and how long they take to resolve. Viewed alongside alert volume, they show whether noise is slowing response times or extending incident duration.
See Creating MTTA tiles for detailed instructions.
MTTR
MTTR (mean time to recover) tracks how quickly it takes you to resolve an incident. Viewed alongside alert volume, they show whether noise is slowing response times or extending incident duration.
See how to create an MTTR tile for detailed instructions.
Next steps
Nice work! You now have a working foundation for understanding your on-call experience. The real value comes from using this dashboard as a feedback loop, not a static report.
Review it regularly. Tune thresholds. Adjust ownership. Watch how changes affect both alert noise and response performance.
When the dashboard helps your team sleep through the night more often and respond with confidence, you’ve moved from simple monitoring to operational intelligence.