We're experiencing payment/checkout problems
Incident Report for Storm Commerce
Postmortem

Incident Report - Payment problems

Norce Storm had intermittent problems with payment processing between 11:20 2023-10-11 to 12:30 2023-10-12. The problems manifested as requests related to payment processing timed out in periods (see below for a summary of periods where payment processing is considered down).

The problem manifested as timeouts or very long response times to payments-related requests to StormApi. The problem was caused by problems in the underlying OS where the number of network sockets available was well below configured levels and thus services were competing for a reduced amount of network sockets than normal. It affected primarily the Payment Service since it's a hub for communicating with all Payment Service and Voucher Providers which requires a substantial amount of network socket usage.

The problem was worsened by a significant increase in traffic during the night, on a Payment Service Adapter which requires more than double the number of network connections.

The problem was resolved at ca 12:53 on 2023-10-12 by moving Payment Service to a secondary VM cluster where the service behaved as normal.

Due to the intermittent nature of the problem, downtime for Payment Services/StormApi is calculated using the percentage of failed requests, 14,33% which translates to 2h 9m downtime in total.

For a more detailed report, please contact your Customer Success Manager

Posted Oct 17, 2023 - 17:35 CEST

Resolved
Closing the incident, most payment-related services have been back to normal since 12:30 and we are closing down this major incident.

An incident report will be published as a post-mortem here on this incident on Statuspage in the coming days.
Posted Oct 12, 2023 - 17:01 CEST
Update
We have performed a number of actions to resolve the issue and mitigate the consequences of the problem, and since ca 12:30 the majority of the problems have been resolved. At ca 13:10 all payment requests are processed as usual.

We continue to investigate the root cause of the problem. We also continue to analyze the root cause and will compile an incident report which will be published as a post-mortem in this thread. We will also adjust the uptime/outage of APIs to reflect actual availability.
Posted Oct 12, 2023 - 13:42 CEST
Update
The problem still persists, we continue to narrow down to the root cause of the issue. We are performing mitigating actions but still intermittent outages.
Posted Oct 12, 2023 - 09:28 CEST
Update
We are still experiencing slow response times and intermittent outages on payment processing. We continue working on resolving the problem.
Posted Oct 12, 2023 - 03:51 CEST
Monitoring
A fix was deployed at 19:08 which reduced the problem. We are still investigating the root cause, no changes have been implemented on any related Storm components. The problem is related to Storm components integration with PSP's and we are investigating any changes to those and how to mitigate the problem.
Posted Oct 11, 2023 - 22:32 CEST
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 11, 2023 - 18:39 CEST
Investigating
We are currently working on the issue, there have been and are still intermittent problems.
Posted Oct 11, 2023 - 18:39 CEST
Monitoring
Services are back online, there was a partial outage and slow response times of payment-related requests between ca 12:20-12:35. We are investigating the root cause.
Posted Oct 11, 2023 - 12:51 CEST
Investigating
We are currently investigating the issue.
Posted Oct 11, 2023 - 12:35 CEST
This incident affected: Storm API.