We have identified an issue with our message queue server which resulted in an outage of import processes, export processes, questionnaire expiration/reminding/replication processes, survey case alert processes, SMS processes, IVR processes, and email invitation processes during the affected period of time.
Once the issue was identified, it was fixed as soon as possible. Affected tasks were resumed/restored where possible and rerun where necessary.
List of fully restored functionality:
- Import processes
- Export processes, including scheduled exports
- Questionnaire expiration/replication processes
- Case alert processes
List of functionality where data loss occurred (either fully or partially) during the affected period of time:
- SMS processes (questions/invitations and replies)
- Questionnaire reminding processes
Affected period of time: 20.11.2019 23:15:15 CET - 21.11.2019 11:13:11 CET
Why did it happen?
This issue occurred after a troubleshooting and maintenance session and was caused by an unexpected voluminous resource usage pattern caused by a misbehaving process. This resulted in the switch of the message queue server to read-only mode.
This issue was also not identified in a short amount of time due to monitoring tools being unable to report this issue because of the unprecedented nature of the issue.
The system was rolled back to the original state as soon as this issue was discovered.