The Challenge
A multi-state healthcare provider operating in nineteen states sought to proactively monitor and resolve errors that arose in its deployment of a Twilio Flex-based programmable call center.
Observability: An Overview
Business applications, whether customer-facing or business-to-business, operate only as well as they remain available and functioning as expected. An application error, processing failure or unexpected event can have a cascading effect upon information operations. Customers must either contact technical support to report the error or abandon the application, creating service delays. Either result forces your business to respond to errors after they have occurred.
Observability is the next evolution in monitoring, providing insight into business processes to speed innovation and enhance customer experience. The practice is particularly well-suited to modern cloud-native application architectures, especially when a business process is distributed across a number of microservices, or is built from several component SaaS capabilities.
DevOps strengthens the Observability practice by providing a common data model between software development and operations engineers to interpret system performance and behavior. DevOps also ensures that issues discovered are rapidly remediated by developers with intimate knowledge of the call center platform.
In this case study, we examine how observability practices are applied to ensure fault-free operation of a Twilio Flex programmable call center platform that is integrated with ServiceNow for service request tracking.
Our Approach
The Twilio Flex programmable call center is a highly flexible, cloud-based platform that includes a robust debugger function, most frequently used during the development phase of the product lifecycle. The Twilio Debugger is an interface which catches and reports on system errors or events. Terazo built custom functionality leveraging this debugger interface. The debugger information payload is sent via a Twilio Function webhook to Datadog, the cloud-based observability platform service used by Terazo.
An example of this error collection is shown here:
Datadog collects the error event, and then applies any critera, such as specific error codes to ignore, and issues a notification via VictorOps (a Splunk on-call provider which can communicate through products such as Slack, in addition to SMS, phone calls and email) and creates a ticket in JIRA Service Desk for prioritization and handling.
All events and their data are stored and available for review in Datadog, allowing Terazo and the healthcare client to triage errors, act upon them, and have the data available for later review and creation of knowledgebase articles.
Once a ticket has been created in Jira Service Desk, Terazo applies ITIL-based service management processes to ensure proper escalation, resolution, and root-cause analysis is performed, limiting recurrence and solidifying confidence in the Twilio Flex programmable call center platform.
*Note: Terazo’s Observability monitoring is software agnostic. Through API integrations and webhooks, Terazo can work with any number of different software platforms, network services, and information providers—not only those indicated in this case study.
Technologies & Services Utilized
DevOps
Managed Services
Twilio Flex Integration
JIRA
VictorOps
Results: The Advantages of Observability
Moving beyond traditional up/down availability monitoring, Terazo leverages modern Observability and DevOps practices to provide high-fidelity insight into critical business processes. Many modern cloud-based business processes require the integration of multiple cloud-native services and creating appropriate monitoring of these processes can be complicated. Terazo leverages its successful software engineering practice to take the client’s perspective in designing systems, including the capabilities used to monitor those systems, so the most important aspects are the ones most closely watched.
Through observability, the information collected answers important questions about our client’s applications and information processing: how often do errors happen, does the error occur only during certain times of the day, or when customers click on a specific button or another front-end event? This information not only helps prioritize and address issues in real-time but collect valuable insight to prevent issues from reoccurring.
Key Outcomes
Terazo helped the client provide better customer experiences by responding to errors in a proactive, rather than reactive, manner. By applying modern Observability and DevOps practices, Terazo continually works with its technology partners to enhance and refine platform-based capabilities that enable innovative solutions to complex challenges.