Overview
We experienced a service outage on midnight between Monday, November 7th, 2016 and Tuesday, November 8th which affected VoC Feedback survey answering system and import processes. During this incident customers experienced issues when answering the surveys such as not being able to open the survey page or not being able to answer a question on the survey page.
Import processes were affected as well but since the import processes are time-limited for most customers and won't execute after 20:00 CET we estimated the impact of imports outage to be very limited.
Existing questionnaire data was not lost and import processes were resumed. We're not seeing any evidence of data loss.
The period of disruption began at November 7th, 2016 ~21:40 CET and lasted until November 8th, 2016 ~02:20 CET.
Total measured downtime during the incident: 4 hours 40 minutes.
Cause
As a side effect of higher than usual amount of customer surveys being answered we experienced a sudden storage system exhaustion in one of the virtual servers related to the service. As a result, it wasn't possible to pass surveys at this time and import processes were postponed.
Post-Mortem
Our system has monitoring of various modules in place and our team was notified about the problem immediately. The issue was identified and fixed. Existing questionnaire data was not lost and import processes were resumed. We're not seeing any evidence of data loss.
To make sure the whole class of such issues can no longer affect our systems beginning with 09.11.2016 we plan to perform the following changes:
- Raise our internal requirements to storage space allocation process on production servers.
- Additionally review our system architecture to make it more fault tolerant to such issues.
We take this incident very seriously and we apologize for any inconvenience you experienced.
Comments
0 comments
Article is closed for comments.