Square status - Japan

Payments Disruption
Incident Report for Square Japan
Postmortem

Incident Summary

On 2023-02-06, Square experienced a service disruption impacting Square payments. Starting at 19:17 UTC, all transactions to Discover started failing due to an external outage. Starting at 19:54 UTC, the disruption spread beyond Discover transactions.

In this postmortem recap, we’ll communicate the root cause of this disruption, document the steps that we took to diagnose and resolve the disruption, and share our analysis and actions to ensure that we are properly defending our customers from service interruptions like this in the future.

Timeline

2023-02-06 19:17 UTC Beginning of Discover impact: all authorizations and verifications to Discover start timing out.

19:23 Engineering is alerted by automated alerting and multiple teams start investigating.

19:42 issquareup.com updated.

19:54 Beginning of wide impact: timeouts cascade and Square's payment processing is degraded globally.

20:41 After exhausting any quick configuration changes to isolate the card network traffic, engineers start preparing code changes to quickly reject Discover transactions.

21:45 End of wide impact: Code changes declining card network auths reaches production. Wide impact ends.

22:00 Discover network recovers, but code to quickly decline auths remains in place.

22:37 Quick declines of Discover transactions are turned off. Card network impact largely ends. Some data remained cached so a few errors continue.

2023-02-07 00:45 Discover impact ends: All caches have been refreshed and auth declines have returned to normal levels.

00:46 issquareup.com updated to resolved

Analysis

This incident revealed areas of improvement for both our technical infrastructure and our engineering processes, several of which we are actively working on.

The widespread impact was caused by a small portion of authorization traffic timing out and our services handling those timeouts poorly. We discovered that for a single upstream processing partner, Square’s systems mark the connection as unhealthy after any timeout of a financial message like an authorization. This let the timeouts from the Discover issue mark all of our connections as unhealthy, impacting other transactions. From 19:23 to 19:54 we had enough healthy connections to serve traffic, but after enough connections went unhealthy at 19:54 we were unable to serve a significant portion of other traffic. We are actively working on addressing this behavior. We will begin testing these improvements next week. 

This outage also illustrated the need to be able to quickly disable traffic that is threatening other Square infrastructure. If we had this, the impact wouldn’t have spread beyond Discover transactions. We will be adding this to multiple layers of the payments stack. Our emergency mitigation will remain available to be reenabled until this change is released.

We know any disruption is painful for our customers. We are in the midst of a longer-term effort to identify critical payment flows for our sellers and improve those systems’ resiliency to disruption.

Posted Feb 16, 2023 - 06:46 JST

Resolved
We’re no longer seeing elevated levels of payment declines and card processing activity has returned to normal. Our team will continue to work with our partners on uncovering the underlying cause of the earlier disruption and putting in place safeguards to help prevent issues of this nature from recurring in the future.
We understand how important it is for all of our services to be running for your business, thank you for your patience with us as we monitored the situation.
Posted Feb 07, 2023 - 09:46 JST
Monitoring
While we are seeing less elevated level of declines with most card carriers, we are continuing to see reports of declines when using Discover. If you continue to experience declines, please advise your customers to use an alternative card or form of payment.
We appreciate your patience as we continue to work on resolving this disruption. We will continue monitoring the situation and provide updates as we have them.
Posted Feb 07, 2023 - 09:16 JST
Update
We are continuing to investigate an ongoing disruption in Payment Acceptance. We know how detrimental service disruptions can be to the functionality of your business. We appreciate your patience during this time while our engineers work towards a fix. We will continue to post updates here as we receive them.
Posted Feb 07, 2023 - 07:18 JST
Update
We are currently experiencing a disruption resulting in some sellers being unable to accept payments. We understand how important it is never to miss a sale, and our Engineering team is actively working on a fix. Thank you for your patience with us as we work to resolve this issue.
Posted Feb 07, 2023 - 06:25 JST
Update
We’re currently investigating reports regarding Discover card payments not completing for customers. We will provide further updates as they come to hand, during this time, we recommend trying another card to complete a purchase. Thank you for your patience as our engineers continue to investigate.
Posted Feb 07, 2023 - 05:35 JST
Investigating
We are currently experiencing a disruption that is impacting some Square services. We understand how important it is for your business for all of our services to be up and running, and our Engineering team is actively working on a fix. Thank you for your patience with us as we work to resolve this issue.
Posted Feb 07, 2023 - 05:35 JST