Skip to main content

When we look at business value, we often assume that the product we are building, that app which is going to revolutionize the world, that change in the infrastructure of a complex process, will provide endless benefits to users and consumers. What we do not expect are changes which can sometimes have catastrophic effects on customers, and we are sometimes short-sighted in estimating these effects.

As an example, let us regard recent problems happening with several major air carriers. Failures in software used to board passengers and manage reservations has led to delays and cancellations.

A software glitch caused hundreds of cancellations at the Heathrow Airport.

A software issued caused hundreds of cancellations at the Heathrow Airport.“Tens of thousands of passengers at Heathrow airport have had their flights cancelled or heavily delayed due to an IT failure involving check-in systems…”

The air carriers did not take into account or anticipate the cascading failures which came as a result from a software failure. Thousands of customers were forced into chaos—and ended up sleeping on the floor at airport terminals.

A business can be viewed as a network of relationships, and customers are a part of that relationship. Customers make decisions based on cost, quality, service, and previous experience. When customers have a bad experience, it reflects on business value. If your customers have a poor experience, such as what happened to air carriers, would they return as customers?

Customer relationships are the focal point in our efforts to provide business value, and a bad experience can have long-lasting effects on your business. By tying business value to observability, we have a tool which acts as a bridge to actionable insights which can build better customer relationships.

Observability is a measure of how internal states affect external states. We should think about observability in terms of a hypothesis. In a hypothesis, we ask a question:

If this happens … then this will happen … because …

For instance:

“If I make a change to the flight reservation system without proper testing … then customers will be unable to make a flight reservation … because the system will break.” 

How do we prove our hypothesis? We use observability to provide the data. Observability depends on three pillars: metrics, logs, and tracing. Metrics are the data we collect from API endpoints, databases, front end, and back end processes. Logs record these interactions and can trigger on thresholds should they be exceeded. Tracing follows the process flow from one system to the next.

Observability is not about dashboards and monitors. Monitors are, in most cases, too late. They report on failure and force reliability engineers and DevOps to go down rabbit holes trying to solve problems they think they are seeing. Human beings are notoriously bad at pattern matching – we see patterns where they do not exist. If a dashboard monitor is blinking red, we assume the problem must be there. Observability encourages us to review the data from all angles and possibilities, not just the blinking red one.

To anticipate a reservation system disaster, like in the example above, we must consider failure through the lens of observability. IT staff should be focused on making hypothesis-based decisions where user outcomes are the focus. Code should be built and instrumented to provide granular, high cardinality metadata for metrics, logging, and tracing. Yes, you should monitor and track system processing as an expectation against downtime and/or disaster, but QA should rigorously test high value features and flows educated by real user usage. This requires designing alerts which are actionable and directly apply to user outcomes.

Finally, we must consider not “boiling the ocean”, that is, do not try to fix everything. Use observability to find key issues, report and speak to the business about these issues, and most importantly listen to the business regarding current problems and possibly future issues.

We can add business value through observability by testing, observing, and proposing solutions before they become customers sleeping on the floor of an airport terminal.

You can watch a video replay of the webinar below:

If you enjoyed this presentation, let’s keep the conversation going!

View our recent case studies to see our work in action, and be sure to join our email list to be the first to know about other upcoming webinars and online meetups. Reach out with any questions or comments here.