System Outage

Major incident EU data center NA data center API Client Portal Mobile App PGM PosWeb Reports Payment Processing API Client Portal Mobile App PGM PosWeb Reports Payment Processing
2026-02-04 13:50 CET · 6 hours, 8 minutes

Updates

Post-mortem

Post-Incident Report

Date of Incident: February 4, 2026
Total Resolution Time: 4 hours 30 minutes
Status: All Services Restored


1. Executive Summary

On February 4, 2026 at 13:45 CET, our infrastructure team identified a service disruption affecting a portion of our server environment. While we were able to restore the primary connection within 15 minutes, the subsequent re-balancing of system traffic led to a period of 2h reduced performance and intermittent service for some clients.

Our team worked throughout the afternoon to stabilize the environment and ensure all requests were processed correctly. By 18:10, all services had returned to their standard high-performance levels (with a few clients that had still timeouts).


2. Updated Service Timeline

Time (CET) Service Status Description
13:45 Service Interruption A connectivity issue within the data center affected primary access for Europe and North America regions.
14:00 Initial Recovery Core connectivity was restored. Systems moved into a stabilization phase.
14:00 – 16:00 Performance Stabilization Applications were accessible, though users may have experienced frequent intermittent delays or slower response times as traffic queues were cleared.
16:00 – 18:10 Final Optimization #1 - 95% companies recovered Services began returning to normal operation. Only minor, isolated fluctuations were reported as final system checks were completed. At 18:10 full restoration for 95% of the clients.
18:10 - 20:15 Final Optimization #2 - last 5% companies recovered Focused restoration for the remaining 5% of clients. This involved a controlled migration of specific database resources back to their primary environments.
20:15 Maintenance Complete All clients successfully restored to their permanent, optimized environments. Final background checks concluded.

3. What Happened

The incident was triggered by a physical connection failure between our servers and our storage system. This led to several “cascading” effects:

  • Access Blocked: The security layer that manages global traffic went offline, preventing logins.
  • Database Lag: One of our primary database hosts became unresponsive, which caused a backlog of requests.
  • Traffic Congestion: Even after the hardware was fixed, the volume of “waiting” requests caused our routing apps to hit their limits, resulting in temporary error messages (503 Service Unavailable) for some users.

4. Corrective Measures & Improvements

We are committed to ensuring this specific failure does not recur. Our engineering team is currently implementing the following upgrades:

  • Infrastructure Audit: We are conducting a top-to-bottom audit of our data center hardware and interconnections to identify and replace any aging components.
  • Expanded Capacity: We are adding additional “Routing Application” instances to our environment to better handle sudden traffic spikes and prevent queuing during hardware failures.
  • Automated Redirection: We are developing a new “Emergency Redirection” system. This will allow us to instantly move traffic to maintenance pages or backup servers without causing a backlog on healthy parts of the system.

5. Moving Forward

We value the trust you place in us to power your business. A deep-dive forensic analysis is currently underway to ensure the underlying hardware fault is permanently corrected. We remain dedicated to providing a stable, high-performance environment for all our global partners.

February 5, 2026 · 19:38 CET
Resolved

We are pleased to inform you that the technical issues have been resolved. All applications are now fully operational and back online.

Our team is monitoring the system to ensure stability, and everything is performing as expected. We apologize for any inconvenience this may have caused and appreciate your patience.

February 4, 2026 · 19:56 CET
Resolved

After resolving the core issues for most clients, as indicated in earlier posts, we were still grappling with some major issues affecting a smaller number of our clients. We can now confirm that all issues have been resolved, and all systems have been restored to proper working capacity.

Thank you for your patience through this difficult incident. We will be preparing a Postmortem Analysis for this incident, which will be available to all of our clients upon request.

February 4, 2026 · 19:45 CET
Update

The majority of our systems are back now, but the system is still in recovery mode, and therefore there may still be significant performance issues still being felt in multiple applications.

February 4, 2026 · 16:39 CET
Update

We are continuing to experience service interruptions in multiple applications and the wider the system. We understand the problem and a recovery plan is being executed.

February 4, 2026 · 16:09 CET
Update

Some applications are starting to come back online. We are gradually restoring access for all affected customers and will continue to monitor the system until full stability is reached.

February 4, 2026 · 14:24 CET
Investigating

We are currently experiencing system-wide availability issues affecting all applications. Our team is investigating the root cause and working toward a solution as a top priority.

February 4, 2026 · 13:57 CET

← Back