Intermittent Service Degredation

Minor incident AU data center SG data center PGM PGM
2026-03-05 05:20 CET · 3 hours, 34 minutes

Updates

Post-mortem

Post-Incident Report

Date of Incident: 5th March 2026
Total Resolution Time: 3 hours 34 minutes
Status: All Services Restored


1. Executive Summary

On 5th March at 15:20 AEDT, customers in AUS/SG experienced intermittent degredation of the PGM application.
The incident was caused by an overflow of requests to a specific backend version of PGM. Service was restored by temporarily rolling back the affected version, and redistributing client load to standby application instances to allow the system to process the long queue.


2. Customer Impact

Who was affected

  • AUS/SG Clients on the 4.930.2361.303 version of PGM

What customers experienced

  • Slow page load times, endless spinners, etc.
  • Periodic 503 errors during remediation efforts

3. Timeline of Events

Time Event
15:20 AEDT An automated alert informs the team of high latency in some PGM deployments
15:28 An application restart is triggered to flush the request queue
15:35 The platforms team determines the root cause to be a specific version of PGM
15:45 The queue fills again, causing more slow downs for customers on this instance of PGM.
15:47 An application restart is triggered to flush the request queue
15:50 The affected version of PGM is rolled back to allow the queue of requests to flush. A small number of customers observe 503 errors during this time.
16:00 Some customers are reallocated to standby instances of the application.
16:15 Services begin to recover and the request queue begins to fall.
16:30 Services stabilised and latency drops to nominal levels

4. Moving Forward

We value the trust you place in us to power your business. We are improving our application resilience across both the software and hardware domains. We remain dedicated to providing a stable, high-performance environment for all our global partners.

March 5, 2026 · 09:36 CET
Resolved

This incident is now resolved

March 5, 2026 · 08:53 CET
Monitoring

We’ve applied changes to the PGM application hosting and have seen a strong improvement in the stability of the site already. We are still actively monitoring the situation.

March 5, 2026 · 06:32 CET
Investigating

We are investigating an ongoing issue whereby some customers experience intermittent service depredations in PGM. We are implementing remediation steps and will keep you updated as the situation progresses.

March 5, 2026 · 06:11 CET

← Back