Observability is at its core, the use of data to analyse what is happening with the system, understand threats and issues such as load or performance and to use the data to fix on a fail, or engage in continuous service improvement. Monitoring and logging is not only about the external outputs of the system, but what you do with them. Observability is in essence, the ability to answer questions about what is going on with the system, and how to make it better.
To achieve this firms will need to: standardise their tooling, understand the use cases and various actors who use telemetry (logs and metrics), and build a single pane of glass to view dashboards and alerts across platforms. Technologies such as eBPF and OpenTelemetry will lower the barrier to entry on instrumentation, and matured data analytics practices will enable IT and DevOps teams to identify and respond to issues more quickly and effectively.
Many IT and business leaders still don’t realize just how much potential distributed tracing holds, and this represents a huge, missed opportunity in the quest to optimize observability. However, the next year is likely to see a significant uptick in adoption. As more organizations migrate their workloads to cloud-native and microservices architectures, distributed tracing will become more prevalent as a means to pinpoint where failures occur and what causes poor performance.
Distributed tracing can open up a whole new world of observability into numerous processes beyond IT monitoring, in areas as diverse as developer experience, business, and FinOps. Distributed tracing relies on instrumenting application with the mechanics of propagating context when executing requests. You can easily use the context propagation mechanism for many other processes, such as tracking resource attribution or capacity planning information per product line or per customer account.
Data privacy compliance is another extremely useful application of distributed tracing. Compliance regulations such as GDPR and CCPA, data privacy is a huge priority, and this challenge is exacerbated by the fact that low-level storage is often unaware of user context. By propagating user IDs from downstream tiers to data storage tiers, distributed tracing can help organizations to better enforce their data privacy policies.
Movement beyond the ‘three pillars’ of observability
Discussions about observability often begin and end with what have come to be called the “three pillars of observability.” These are metrics, logs, and traces. Metrics help to detect problems and let DevOps or site reliability engineers understand what has happened. Logs, then, help to diagnose issues, providing the “why” behind the “what.” Finally, traces help engineers to pinpoint and isolate issues by indicating where they happened within distributed requests and elaborate microservice graphs.
These three pillars continue to be critically important. But it’s important not to be confined by the “three pillars” paradigm and to choose the right telemetry data for your needs. In the coming year, I expect we’ll be seeing more organizations embrace additional types of observability signals, including events and continuous profiling.
It is also important to remember that the “three pillars,” or any other telemetry signal for that matter, is just the raw data. Observability is a data analytics problem, and as such, it is about proactively extracting insights out of that raw data, similar to BI analysts in a way. Don’t wait for the customer to inform you that something is wrong.
More momentum behind eBPF
Extended Berkeley Packet Filter, or eBPF, is a technology that allows programs to run in the operating system’s kernel space without having to change the kernel source code or add additional modules. Currently, observability practice is largely based on manual instrumentation, requiring the addition of code at relevant points to generate telemetry data, which often presents a significant barrier, and can even prevent some organizations from implementing observability.
While auto-instrumentation agents do exist, they tend to be tailored to specific programming languages and frameworks. However, eBPF allows organizations to embrace no-code instrumentation across their entire software stack, right from the OS kernel level, providing easier observability into their Kubernetes environments and offering benefits around networking and security.
Because eBPF works across different types of traffic, it helps organizations to meet their goal of unified observability. For instance, DevOps engineers might use eBPF to collect full body trace requests, database queries, HTTP requests, or gRPC streams. They can also use eBPF to collect resource utilization metrics, including CPU usage or bytes sent, allowing the organization to calculate relevant statistics and profile their data to understand the resource consumption of various functions. Additionally, eBPF can handle encrypted traffic.
Netflix recently published a blog about how the company is using eBPF to capture network insights. According to the company, the use of eBPF has been highly efficient, consuming less than one percent of CPU and memory in any instance.
Unification of siloed tools
As observability matures, organizations will increasingly look to holistic observability platforms, favouring these integrated solutions over the more siloed tools that they have used in the past. Compared to stand-alone observability tools, these more holistic platforms can better position developers, DevOps, and SREs to address querying, visualization, and correlation across all of their different telemetry signal types and sources.
We saw this unification trend in the past year, with major vendors such as Grafana Labs, Datadog, AppDynamics, coming out of their respective specialty domains in log analytics, infrastructure monitoring, APM, or others, and expanding into a more comprehensive observability offering. We’ll see this trend accelerating in 2022, adapting to the changing observability needs and changing the competitive landscape.
Continued adoption of open source tools and standards
The open-source community created Kubernetes (and, essentially, the entire concept of “cloud native”). This same community is now delivering open-source tools and standards to monitor these environments. New open standards like OpenMetrics and = OpenTelemetry will mature, becoming de facto industry standards in the process. In fact, OpenMetrics may be adopted this coming year as a formal standard by IETF, the premier internet standards organization. The rise of open-source tools not only provides companies with additional options for enabling observability, but also prevents the vendor lock-in that has historically plagued some corners of the IT industry.