False PagerDuty Alerts
Incident Report for Runscope

Root cause of the problem

As part of BlazeMeter/Runscope’s business continuity plans, BlazeMeter/Runscope engineering is preparing redundant stage and DR environments in other locations/providers to ensure business and data continuity. As part of this initiative, they were performing simulation testing in their redundant staging environment with redundant test schedules dataset from production backups. But unfortunately, the simulation test missed turning off the integrations associated with those test schedules, hence, the staging schedules simulation triggered failure notifications to customer’s configured alert/notification mechanism associated with it.

What we are doing to avoid recurrences in future:

  1. We immediately stopped simulation tests from stage to prevent any further false alerts going to customers
  2. We revisited our simulation and data redundancy plan and started employing additional data validation, cleansing and sanitization on the staging simulation dataset.
  3. Working on employing outbound gateway to control all egress traffic in staging to be able to control these alerts and notification.
Posted Feb 10, 2020 - 07:54 PST

The incident has been resolved.
Posted Jan 29, 2020 - 19:54 PST
The issue has been identified and a fix is being implemented. We will provide another update by 20:35 PST.
Posted Jan 29, 2020 - 19:39 PST
We are continuing to investigate this issue. We are aware that false alerts are beings sent for additional integrations and also that links are pointing to www.stage.runscope.com instead of www.runscope.com.
Posted Jan 29, 2020 - 18:55 PST
We have reports of passing tests triggering PagerDuty alerts. We are currently investigating.
Posted Jan 29, 2020 - 18:42 PST
This incident affected: API Monitoring (Runscope).