Observability at La Redoute

What is Observability?

Observability can mean different things to different people. While for some, it’s about logs, metrics and traces (the three pillars of observability – defended by the majority of the community), for others “it’s the old wine of monitoring in a new bottle” However, the overall idea is the act of bringing better visibility into systems – be able to infer the current state of a system from its external outputs and have context in order to understand its states.

Figure 1 : The three pillars of Observability

Monitoring and Observability

“Monitoring tells you whether a system is working, observability lets you ask why it isn’t working. “

(Baron Schwartz, October 19, 2017)

On a very simplified way, Monitoring is the passive collection of metrics, logs, events, etc. about a system, while Observability is the active dissemination of information from the system. Monitoring is most often used for alerting, troubleshooting, capacity planning, and other traditional IT Ops functions, usually not too extensive. On the other hand, Observability elements are often much detailed and diverse, and more used for debugging, complex troubleshooting, performance analyses, and normally go deeper into data.

Monitoring and Observality — Figure 2 : Monitoring and Observability Pyramid

While Monitoring an application will get us information about the systems and let us know in the event of a failure, Observability is more a quality of the applications or technologies that allows an easy way of seeing closely what and where it broke.

Making La Redoute Observable

As La Redoute has been moving from a monolithic to a distributed architecture, our scalability will be increasing dramatically. Consequently, the overall complexity of systems and their interactions will also escalate. Visibility into the performance and health of our miscellaneous service topology will become an important boost for quickly determine the root cause of issues, as well as increasing La Redoute’s reliability and efficiency.

Figure 3 : Increasing need for obersvability to support distributed systems

Nowadays monitoring is something we perform against our applications and systems to determine their state. From the up/down to a more proactive performance health checks, we monitor mainly to detect problems and anomalies. Currently our IT teams have developed several processes capable of monitoring past events or expected failures. We build dashboards, alerts and consume metrics based on previous experiences. This process helps us finding the root cause for problems and gain insights into capacity requirements and performance trends over time.

Why Observability is not limited to Monitoring

In contrast to monitor, which is something we (actually) do, observability, is more a property of a system. Meaning that, if optimized (old) IT systems and applications don’t properly externalize their state, then even the best monitoring can fail. Therefore, it’s imperative that modern tools must be used to better understand the properties of applications and its performance, as complex distributed systems take shape across the delivery pipelines into production. In a DevOps world, it’s very important that applications and systems become themselves both observable. So how do we do that?

There are many practices that contribute to observability, and which can be found in several software and tools. But the main idea is to externalize key applications through logs, metrics and events.

Logging

To debug or solve problems, we need to have access to the right information. Server logs contain the information needed to diagnose an issue. Also, logs can track the history of changes, which helps with the examinations and conformities.

Metrics

Metrics is all about numbers. Each metric looks at specific data over time to help understand past trends and events, as well as what is happening now. The information provided by metrics can also be used to predict what will happen in a near future.

Tracing (Events)

Logs and metrics can give you observability, but it is usually just about a specific server or part of a larger system. Tracing follows a series of related events that show a server request from end to end. With Tracing we can reliably get the state of application performance and the service being delivered, by measuring all the work being done across many dependencies, helping us diagnose larger issues.

Why we shift-left Observability

In La Redoute reality, improving observability means keeping watch over all application components — from mobile and web front-ends to infrastructure. Until now, this involves gathering and analysing information from many data sources – app logs, time-series data and so on. Now, however, conditions are more complex and to get the real picture of customer experience we need clearer insights delivered in context of how mobile and web apps are being used and consumed. The best way to guarantee observability is to build it into our code as we write it. By focusing on observability during the development process, DevOps teams will have a better understanding of our software, include required instrumentation when it ships and regularly monitor it to ensure it’s working properly.

In all cases we cannot forget the human factor. No matter how smart our current monitoring is, it will hardly count if teams don’t use it wisely when designing, developing, testing or delivering their applications. It’s therefore important that modern monitoring methods are built into a deployment pipeline with the minimum of complications. People need to be trained on how to get better at turning their systems observable. Sometimes it involves delivering fast insights to get some quick wins, but it could quickly become a highly effective service, capable of giving some insights about monitoring designs and improvement strategies.

Sources

Figure 3 : Serverless Computing Deloitte

The New Stack – Monitoring & Observability

Logz.io What is Observability

Author

Ines Fernandes

View all posts