XVWeb Image Loading Degraded
Incident Report for Apteryx XVWeb
Postmortem

Context/Background

The XVWeb Service runs on multiple server instances in the Azure Cloud, which auto-scale on a predictive schedule (to meet the cyclical demand of our customers' during business hours), as well as programmatically in response to any atypical increases in usage.

As new instances are created, each loads its required settings from a centralized Microsoft Azure configuration system. This configuration system ensures that any new instance is functionally identical to every other, so that all users can benefit from the extra servers.

Problem Summary

Shortly before 10:00am EST, XVWeb began adding new instances to support an atypical increase in traffic. The creation of additional instances overwhelmed the Microsoft Azure configuration system, and the new instances were unable to start.

Mitigation

At 10:44am EST, XVWeb Engineers manually restarted the instances which did not have configuration values, the new instances loaded correctly and performance returned to normal levels by 10:58am EST.

Following this, XVWeb Engineers have made appropriate changes to prevent the Microsoft Azure configuration system from becoming overwhelmed in this way. These changes were deployed at 1:40pm EST and monitoring continues to ensure the problem does not re-occur.

Posted Dec 03, 2022 - 13:52 EST

Resolved
We have confirmed that this issue has been resolved as of 11:48am.

A postmortem is forthcoming.
Posted Dec 02, 2022 - 15:05 EST
Monitoring
Performance has returned to normal and we are continuing to monitor for further improvement.
Posted Dec 02, 2022 - 11:00 EST
Investigating
We are currently investigating reports of slowness loading images in XVWeb.
Posted Dec 02, 2022 - 10:44 EST
This incident affected: XVWeb (XVWeb).