Change failure rate
This tile models change failure rate by correlating successful build runs with incident-level bugs. Successful builds represent changes that were eligible to reach production, while incident-priority bugs represent negative, user-impacting outcomes.
By comparing the volume of incidents to the volume of successful changes over the same timeframe, we can surface periods where change is disproportionately causing problems, helping teams identify elevated delivery risk without relying on explicit deployment-to-incident linkage.
We'll utilize two data sources in this tile, then combine the datasets using SQL Analytics:
- Azure DevOps: We'll use the Build Runs data stream to return successful build runs.
- Jira: We'll use the JQL data stream to return incident tickets that have been raised since the deployment. If you use a different tool for tracking incidents (e.g. JSM or ServiceNow), you can easily edit this tile to use a data stream from another data source.
Configuring the tile
Configure the following in the tile editor:
- SQL Analytics: Enable the toggle.
- dataset1:
- Data Source: Select Azure DevOps.
- Data Stream: Select Build Runs.
- Objects: Select the projects you want data from.
- Parameters > Result: Select Succeeded to return only successful builds.
Timeframe: Select the timeframe you want to track. Note that after adding a monitor or configuring a KPI the Use dashboard timeframe option is disabled.
- Click Add dataset to add another data source.
- dataset2:
- Data Source: Select Jira Service Management.
- Data Stream: Select JQL Query.
- Parameters > JQL query: Enter a query to return incidents. For example:
TYPE = 'Bug' and Priority = 'Incident' - Select the same timeframe you selected for dataset1.
- SQL > Query: Enter a query such as the following to combine your results to calculate the change failure rate:
SELECT builds.successful_builds, bugs.incident_bugs, ROUND(bugs.incident_bugs * 100.0 / NULLIF(builds.successful_builds, 0), 1) AS change_failure_rate FROM (SELECT COUNT(*) AS successful_builds FROM dataset1) AS builds, (SELECT COUNT(*) AS incident_bugs FROM dataset2) AS bugs; - SQL > Columns: Click Edit next to the Change Failure Rate column and configure the following:
- Type: select Percent.
- Decimal Places: Enter 0.
- Click Save.
- Visualization: Select Gauge.
- Mapping > Value: Select Change Failure Rate.
- Range > Min: Select Fixed and enter 0.
- Range > Max: Select Fixed and enter 100.
- Click Save.
Adding a monitor
With a healthy baseline change failure rate of around 5-10%, we trigger a warning when failure begins to exceed expected levels, and an error when it reaches a point that indicates elevated deployment risk.
Configure the monitor
Configure the following in the tile editor:
- Monitoring: Enable the Monitoring toggle.
- Type: Select Threshold.
- Value: Select top.
- Column: Select Change Failure Rate.
- Conditions:
- Error: Enable the toggle, then configure as Greater than, and supply an appropriate value. For our example, we’ll enter 20.
- Warning: Enable the toggle, then configure as Greater than, and supply an appropriate value. For our example, we’ll enter 12.
- Click Save.
Publishing a KPI
To give even more value to this dashboard (that's right, it gets even better!), we can promote key operational metrics to first-class KPIs.
Once published, these KPIs can be selected as a data stream when configuring a tile, making it both easier to track on-call performance over time and to reuse this data across your other dashboards.
Once you've finished, don't forget to finish configuring your Risk Alerts tile!
Add the KPI type
Configure the following on the Settings > KPI > KPI Types page:
- Click Add KPI type. The Add KPI type window opens.
- Name: Enter Change failure rate.
- Click Save.
Configure the tile KPI
Configure the following in the tile editor for each of the KPI types you configured.
- KPI: Enable the toggle.
- Type: Select Change failure rate.
- Click Save.