Performance Degraded - XVWeb
Incident Report for Apteryx XVWeb
Postmortem

Context/Background

The XVWeb Service runs on multiple instances in the Azure Cloud. Each of these instances contains an in-built health monitor, which reports the performance of each individual instance.

Problem Summary

Shortly after 4:00pm EST, a portion of these instances started reporting as “unhealthy” and automated processes began removing these unhealthy instances from the rotation. At the same time, a separate process began creating new instances to replace them.

Unfortunately, the automated creation of new instances did not keep pace with the removal of “unhealthy” instances, which resulted in degraded performance of the application during this time period.

Mitigation

At 4:45pm EST, a configuration change was applied to mitigate the active performance issue, and prevent this from re-occurring in the future.

Posted Nov 30, 2022 - 17:58 EST

Resolved
XVWeb has returned to normal operation as of 16:57 EST.
Posted Nov 30, 2022 - 17:04 EST
Monitoring
Some customers are reporting slower than normal performance within XVWeb.

We have applied a fix and are continuing to monitor.
Posted Nov 30, 2022 - 16:43 EST
This incident affected: XVWeb (XVWeb).