On Wednesday, July 12, we experienced an outage of our messaging service. Our SMS provider's primary data center suffered a loss of connectivity due to a lightning strike, causing severe power issues which resulted in the service being unavailable for approximately five hours. The outage did not impact Flowroute’s Voice, API, or Manage services.
We are proactively working with our messaging service provider to better understand their recovery process and plans to minimize potential disruption of service in the future due to impairment of their datacenters. In addition, we are independently investigating how we can increase our service failover capability to avoid loss of SMS service in the future.
We sincerely apologize for the interruption of your messaging service. If you have any questions or concerns, please do not hesitate to contact us at firstname.lastname@example.org.
Network Service Notification
This is an informational bulletin only. NO ACTION IS REQUIRED.
Recent Service Impairment Status: Fully Restored
Services Affected: All inbound and outbound messaging
Incident Timeline: July 12, 2017
|9:30 AM PT||Flowroute received initial reports that messages were failing and began troubleshooting|
|9:45 AM PT||Flowroute confirmed with our main messaging provider that all messaging services are down|
|9:53 AM PT||Flowroute notified customers that reported issues and posted an Intercom message|
|11:11 AM PT||Main messaging provider updated that services were down but all messages were queued|
|11:25 AM PT||Main messaging provider reported their data center was experiencing connectivity issues, causing the outage|
|2:57 PM PT||Main messaging provider reported some functionality restored but messaging delivery is still degraded|
|3:29 PM PT||Confirmed with main messaging provider that all queued messages were cleared|
|3:30 PM PT - 5:30 PM PT||Retested messaging to verify if service was fully functional and experienced degraded messaging service|
|5:30 PM PT||Verified that not all outbound messages had been successfully sent and received; initiated procedures to replay all affected outbound messages|
|6:15 PM PT||Flowroute replayed queued outbound messages|
|9:20 PM PT||Outbound message replay completed|
Root Cause: On Wednesday morning, around 9:30 AM PT, we received reports of messaging failures and began to investigate. We quickly confirmed that our main messaging provider suffered a loss of connectivity due to a lightning strike, causing severe power issues which resulted in the messaging service being unavailable for approximately five hours. The outage did not impact Flowroute’s Voice, API, or Manage services.
Resolution Summary: Once our main messaging provider restored functionality all inbound and outbound messages sent during the outage were sent. To ensure outbound messages were successfully received, Flowroute resent all outbound messages queued during the outage - we have confirmed that your accounts were not billed for the second message.
Corrective and Preventative Measures: We have reduced the threshold on our monitoring for connectivity with our main messaging provider which will send an alert if we experience issues in a more timely manner. We are also investigating how we can increase our service failover capability to avoid any loss of messaging service in the future. In addition, we are working with our messaging service provider to better understand their recovery processes and plans to implement backups for connectivity in order to minimize potential future disruption of service due to impairment of their datacenters.