Service Interruption

Major incident AU data center SG data center API Client Portal Mobile App PGM PosWeb Reports Payment Processing API Client Portal Mobile App PGM PosWeb Reports Payment Processing
2026-02-06 09:45 CET · 57 minutes

Updates

Post-mortem

Post-Incident Report

Date of Incident: 6th February 2026
Total Resolution Time: 0 hours 57 minutes
Status: All Services Restored


1. Executive Summary

On February 6, 2026 at 19:45 AEDT, customers in Australia experienced a service disruption caused by an internal service account being automatically locked after a high volume of authentication attempts. Our security controls correctly detected the spike and responded by enforcing lockout protections, but in this case the traffic was legitimate: it was generated by automated recovery activity in our Polish data center following an earlier major outage. Once identified, we restored service by unlocking the account and stabilising authentication behaviour. We have since taken steps to reduce the chance of this scenario recurring while preserving our security-first posture.


3. What Happened

Summary of the event

  • A service account used for internal system-to-system authentication in the Australia environment was locked due to a large number of login attempts in a short period.
  • The authentication spike was not malicious. It was caused by the Polish data center re-establishing connections and retrying authentication as it recovered from a prior systems failure.
  • The lockout interrupted internal communications that depend on that service account, resulting in customer-facing impact in Australia.

Why the security controls triggered

Our internal controls are designed to detect brute-force patterns and automatically lock accounts to protect systems. During the Polish recovery, the retry and reconnect behaviour resembled brute-force activity (high frequency, repeated failures/retries), causing the lockout policy to trigger.

Customer impact

  • Loss or degradation of service for Australian customers, affecting several applications and companies.
  • Intermittent failures while the account remained locked, until manual remediation was applied.

Resolution

  • The infrastructure team identified the lockout as the immediate cause of the disruption.
  • The service account was unlocked and authentication behaviour was brought back to normal levels.
  • Monitoring was used to confirm internal communications recovered and services stabilised.

4. Corrective Measures & Improvements

We have implemented changes to prevent a legitimate recovery event in one region from causing avoidable disruption in another, while maintaining strong protections against real brute-force attempts.

Authentication and lockout safeguards

  • Improved detection logic and alerting: Updated our authentication anomaly detection to more accurately recognise recovery-related retry patterns (high-volume, short-duration bursts from known internal systems) versus true brute-force behaviour. We also tightened associated alerting so the on-call team is notified earlier and with clearer context when authentication spikes occur.
  • Standardised retry/backoff behaviour: Implemented consistent exponential backoff with jitter for internal authentication retries to prevent synchronized reconnect “storms.” This smooths recovery traffic, reduces peak load on authentication components, and materially lowers the chance of lockout thresholds being hit during legitimate system recovery.

5. Moving Forward

We value the trust you place in us to run critical services reliably and securely. This incident was the result of a security control operating as designed, but applying the wrong mitigation to legitimate recovery traffic. Our focus now is on improving the separation between recovery behaviour and attack behaviour, strengthening service account protections, and introducing better cross-region safeguards so that a recovery event in one data center cannot unnecessarily disrupt customers in another.

February 26, 2026 · 01:37 CET
Resolved

We are pleased to inform you that the technical issues have been resolved. All applications are now fully operational and back online.

Our team is monitoring the system to ensure stability, and everything is performing as expected. We apologize for any inconvenience this may have caused and appreciate your patience.

February 6, 2026 · 10:42 CET
Monitoring

The system recovery is looking positive. We are continuing to monitor closely.

February 6, 2026 · 10:24 CET
Update

We have identified the core issue and all systems are recovering now.

February 6, 2026 · 10:21 CET
Issue

We have identified an outage affecting the PGM and POS systems. We are currently in the investigation phase to determine the scope and cause of the disruption. We will provide an update as soon as more information becomes available. We apologize for any inconvenience.

February 6, 2026 · 09:45 CET

← Back