We had a partial outage of the planner function on all environments during 13:10 to 13:25 UTC due to a central database being unavailable.
The direct cause of the outage was three ElasticSeach nodes crashing simultaneously, causing crucial data to be inaccessible. This seems to be triggered by our offsite backup, which internally means a snapshot of the ES data. This has been working fine until recently when we upgraded the underlying Linux platform the nodes are running on. This has happened on both an ES 7 and an ES 8 cluster at different times, so seems to be an unresolved issue in ES.
We are now investigating if we need to downgrade the Linux platform, or use other backup solutions.