Partial planner outage
Incident Report for Iternio Status
Postmortem

We had a partial outage of the planner function on all environments during 13:10 to 13:25 UTC due to a central database being unavailable.

The direct cause of the outage was three ElasticSeach nodes crashing simultaneously, causing crucial data to be inaccessible. This seems to be triggered by our offsite backup, which internally means a snapshot of the ES data. This has been working fine until recently when we upgraded the underlying Linux platform the nodes are running on. This has happened on both an ES 7 and an ES 8 cluster at different times, so seems to be an unresolved issue in ES.

We are now investigating if we need to downgrade the Linux platform, or use other backup solutions.

Posted Sep 09, 2024 - 14:06 UTC

Resolved
This incident has been resolved.
Posted Sep 09, 2024 - 14:02 UTC
Monitoring
The ElasticSearch database instances have restarted and restored their state and everything is operational again.
Posted Sep 09, 2024 - 14:00 UTC
Identified
We had a number of database instances crash simultaneously. They are now back online and we are starting the investigation on why this occured.
Posted Sep 09, 2024 - 13:29 UTC
Update
We are continuing to investigate this issue.
Posted Sep 09, 2024 - 13:27 UTC
Investigating
We are currently investigating this issue.
Posted Sep 09, 2024 - 13:27 UTC
This incident affected: ABetterRouteplanner (ABRP), Iternio Planning API, OEM1 Planner API, and OEM2 Planner API.