Context/Background
The XVWeb Service runs on multiple instances in the Azure Cloud. Each of these instances contains an in-built health monitor, which reports the performance of each individual instance.
Problem Summary
Shortly after 4:00pm EST, a portion of these instances started reporting as “unhealthy” and automated processes began removing these unhealthy instances from the rotation. At the same time, a separate process began creating new instances to replace them.
Unfortunately, the automated creation of new instances did not keep pace with the removal of “unhealthy” instances, which resulted in degraded performance of the application during this time period.
Mitigation
At 4:45pm EST, a configuration change was applied to mitigate the active performance issue, and prevent this from re-occurring in the future.