On 7/18, one of the nightly internal batch processes within XVWeb did not complete on time, due to the presence of invalid data on one XVWeb Customer Site. Because of this, the Apteryx Token Registration Server (TRS) began responding slower than expected. This was exacerbated by increased demand on the TRS from clients whose Tokens had expired at various points over the weekend hours.
At 8:00am EDT, we were alerted to the performance degradation and attempted to mitigate by allocating additional server resources to the TRS. This improved the performance but did not ultimately resolve the issue.
At 9:06am EDT, we deployed a fix which removed the invalid data and prevented it from reoccurring.
Performance of the TRS returned to normal by 9:11am EDT.
In addition to the specific fix deployed, we have added increased alerting on the affected batch processes to confirm that they complete within their expected timeframes.