The incident with StormAPI was first registered at 03:24 and all systems were available at 10:29 (7h5m downtime). Our sourcing vendor started error tracing and restart instructions immediately, and Storm started analysis early morning and together we were able to determine the root cause at ca 10:20.
The root cause of the problem was an error with a Client Certificate Revocation List which had gone stale. The Client Certificate Revocation list is a list of client certificates (used to access StormAPI) which have been revoked, which is done when access to a Storm Application is removed.
Due to the revocation list becoming stale, all requests using client certificates was denied (i.e. all requests to StormAPI). Updating the revocation list is a manual procedure when access to an Application is removed, and was not done before the expiry time, due to no Certificates needing revocation.
To prevent this problem from recurring, we will automate publishing the certificate revocation list. We’ve already been working on implementing a new authentication mechanism on modern standards, not based on Client Certificates which we estimates will be available during Q2.
This problem is not related to the incident last weekend.
Regards
./Anders Heintz, CTO