Major outage in Storm Frontend API
Incident Report for Storm Commerce
Postmortem

The incident with StormAPI was first registered at 03:24 and all systems were available at 10:29 (7h5m downtime). Our sourcing vendor started error tracing and restart instructions immediately, and Storm started analysis early morning and together we were able to determine the root cause at ca 10:20.

The root cause of the problem was an error with a Client Certificate Revocation List which had gone stale. The Client Certificate Revocation list is a list of client certificates (used to access StormAPI) which have been revoked, which is done when access to a Storm Application is removed.

Due to the revocation list becoming stale, all requests using client certificates was denied (i.e. all requests to StormAPI). Updating the revocation list is a manual procedure when access to an Application is removed, and was not done before the expiry time, due to no Certificates needing revocation.

To prevent this problem from recurring, we will automate publishing the certificate revocation list. We’ve already been working on implementing a new authentication mechanism on modern standards, not based on Client Certificates which we estimates will be available during Q2.

This problem is not related to the incident last weekend.

Regards

./Anders Heintz, CTO

Posted Feb 22, 2020 - 12:42 CET

Resolved
The incident is resolved, we will be back with post mortem.
Posted Feb 22, 2020 - 10:32 CET
Update
The team is working together with our infrastructure partner to find root cause according to our standard protocol, ruling out potential causes.
Posted Feb 22, 2020 - 10:02 CET
Investigating
We are experiencing a major outage in Storm frontend API on the production environment.

We are investigating the issue.
Posted Feb 22, 2020 - 08:15 CET
This incident affected: Storm API.