What happened?
Update (04:32:00 CET): The issue is resolved. All services are up and functioning. Survey and upload data is going to be replicated and kept up-to-date.
During an ongoing maintenance event, an issue in the update mechanism caused one of the DB servers to lose connectivity with the internal network.
Existing data is safe. Survey answering system works properly. Replication of survey data is currently paused and will be resumed after the server goes up. Processing of VoC Store uploads and afterloads is currently paused and will be resumed after the server goes up. VoC Hub itself and its applications won't be unavailable until the DB server is up.
Affected period of time: 11.06.2019 02:29:30 CET - 11.06.2019 04:05:00 CET
Total: 1 hour, 35 minutes, 30 seconds
We are currently working with the colocation provider to resolve this issue as soon as possible.
Why did it happen?
The issue was caused due to planned database maintenance. An issue with the update mechanism caused the one of the DB servers to lose connectivity with our internal network.
Additionally, an internal backup remote access system to reach the server in such cases (KVM) was also down due to previous network maintenance in the data center. This complicated the ability to restore DB server connection promptly.
What are we doing to avoid such issues?
Our infrastructure team have made the following adjustments to the internal processes:
1. Maintenance works will be conducted only after ensuring that KVM is available.
2. In cases when remote access tools are unavailable, a presence of authorized technical person directly in or near to the data center will be required for the maintenance works to proceed.
3. We will make improvements to the process of provisioning a backup database server in terms of amount of time necessary to restore the server from full backups.
Comments
0 comments
Article is closed for comments.