As part of OpenText Advanced AIOps and Observability, we are introducing OpenTextTM Application Observability, a new SaaS-based solution that provides actionable insights for cloud-native and traditional applications built with OpenTelemetry (OTel). OpenTelemetry is an observability framework that is designed to collect and manage telemetry data, such as traces, metrics and logs. It helps cloud operations teams, Site Reliability Engineers (SRE), and central IT teams detect and manage performance issues quickly and efficiently.
To learn more about OpenTelemetry and Observability concepts, refer to the OpenTelemetry website and read our technical topic What is Observability in IT Operations?
Application Observability works standalone or with Operations Bridge SaaS or Infrastructure Observability (formerly OpScope).
The new release offers additional benefits compared to the Technical Preview that we made available last year:
- For the root-cause analysis use case:
- Better insights into how one transaction triggers other transactions
- Improved display of what is normal and what is abnormal inside a trace
- For the data exploration use case:
- Improved data exploration across applications, micro services and transactions
- New data exploration for APIs, metrics and traces
- Understanding of what is normal and what is abnormal for errors, latencies and throughputs
- Display of service log messages
Furthermore, you can now use your own open-source OpenTelemetry Collector and connect it to Application Observability.
In the next sections, we will answer the questions as to what exactly Application Observability provides and how it helps you find the root cause of a problem and gain visibility into your application performance to better understand the inner workings of an application.
How Application Observability helps you in the root-cause analysis
Let`s assume that a certain transaction fails in your complex cloud-native environment. You do not know why and where the anomaly took place and you are not able to resolve the issue.
To address this use case, Application Observability provides you with the service map that shows all the microservices of your application and a service list.
In the service map, services with errors are shown as red and the size of the services indicates the throughput. The service list allows you to sort the services according to the number of errors and throughput.
Figure 1. Application Details: The Microservices tab with a service list and a service map
Once you select a service with errors, you can switch to the Traces tab and its Trace Groups subsection (a trace group represents a transaction of your application, such as ordering a product). This list shows all trace groups of a service. Again, you can sort the trace groups according to the number of errors, throughput and latency.
Figure 2. Application Details: Trace Groups sorted by the error count
You can then select a trace group with errors and use the Analyze Traces button to drill down into the Traces page to look at all traces over time.
Figure 3. Traces over time
Here you can see how many traces had errors. Clicking the trace showing the error provides you with the trace details and the sequence of operations (in OpenTelemetry it is called a span) within a trace.
Figure 4. Sequence of spans and span details
The sequence diagram shows which span took longer than normal and which had errors (indicated by a red error icon). You can also see which services were involved. When you select a certain span, span details are displayed on the right. Span details show span attributes (such as call status, error codes and other attributes that the developer considered important) and span events. They may also show a link to another trace that triggered this trace.
Inside the trace details, you can also click View Logs to examine the log messages that were logged during the trace.
In the below example, span details indicate that the operation CatalogServiceGetProduct of the product catalogue service returned an error. The log message tells us that there was a problem in preparing the order because the product could not be found.
Figure 5. Log messages logged during the trace
If Application Observability is used by dedicated monitoring teams (cloud operations, Site Reliability Engineering or central IT teams), then they can pass on the gathered information to developers/DevOps teams to get the issue identified and fixed. If Application Observability is already used by developers/DevOps teams during development and testing, then this is valuable information for the developers themselves.
How Application Observability helps you understand the inner workings of an application
Application Observability also provides useful entry points that help explore the application and gain visibility into the inner workings of the system, especially with regard to the throughput and latency of services and APIs. If developers understand the usage patterns of their APIs and see when and how often they are called, they can proactively react before performance bottlenecks occur. You can also explore metrics, in order to view key application or business metrics that your application provides.
To address this use case, there are four entry points that you can choose from: Microservices, Traces, Logs and Metrics tabs.
On the Microservices tab, you can use the service map and the service list as a starting point. The biggest circles on the map indicate services with the highest throughput. You can also use the service list to sort services for highest throughput or latency. This allows you to see which services must handle the highest load or contribute the most to long latencies in your transactions.
Once you have selected a microservice, switch to the Traces - APIs tab to see the usage of the external APIs of the service. The result shows the throughput, latencies and error counts overtime, as well as the baselines for all three. All these allow you to identify abnormal values for a specific API.
Figure 6. Details on APIs
In addition, on the Traces tab, you can use the following subsections to view the data for the selected service across all APIs/trace groups:
- The Throughput tab gives you the overall throughput of a service.
- The Errors tab shows the error count within traces of a service over time and provides information in which trace groups errors have occurred. From here, you can also jump into a trace for the root-cause analysis.
- The Explorer tab allows you to apply filters on traces and to explore traces across all trace groups. This allows you for example to look for certain error codes or other attributes and to check when those occurred.
Another entry point is the Logs tab, which allows you to look at all logs of a service.
And finally, you can use the Metrics tab for metric exploration. Keep in mind that an application or a service may provide hundreds of metrics with many different attributes. To quickly find the metric you are interested in, use the Search field (in this release, we support Sum and Gauge OpenTelemetry metrics – more metric types are planned to be added in future releases).
Last but not least, you can use the Attribute Explorer to look and filter for certain attributes – for example, certain request types. As a result, you can see the metric value over time in the Time Series graph. Hover over the line to see the metric value as a tooltip. Note that you can use the time selector on the top to change the time range.
Figure 7. Metrics Explorer
Prerequisites for using Application Observability
Now that you understand the value that Application Observability can provide to your IT Ops teams, let`s talk about the requirements. So what is required to use Application Observability?
First of all, you need an application that uses OpenTelemetry. The application could include OpenTelemetry like OpenText SMAX or using auto-instrumentation (for details, see Injecting Auto-instrumentation). or you may have instrumented it manually. Second, you need a collector that collects the OpenTelemetry data and forwards it to Application Observability.
During the Technical Preview, we found out that although we are providing the OpenText OpenTelemetry Collector out of the box, you may already have a collector in place that you want to keep using. Now this is possible as well. Just configure it to forward your Open Telemetry data to our SaaS-based Application Observability instance.
The following diagram shows the sequence of necessary steps for a cloud-native containerized application running on Kubernetes. Note that we can also collect the OpenTelemetry data from traditional applications.
Figure 8. Application Observability setup steps
Interested? Please contact your OpenText representative to get a demonstration or a free trial.
Related items:
We encourage you to try out our new features and enhancements! For further information on our offerings, visit the Operations Bridge and OpScope product pages, explore our documentation resources and check out our Operations Bridge Video Library and blogs.
If you have feedback or suggestions, don’t hesitate to comment on this article below.
Explore the full capabilities of Operations Bridge by taking a look at these pages on our Practitioner Portal: Operations Bridge SaaS, Operations Bridge Manager, SiteScope, Operations Agent, Operations Bridge Analytics, Application Performance Management (APM) and Operations Orchestration (OO).
Events
- On-demand webinar: OpenTelemetry Changes Everything
- On-demand webinar: Operations Bridge 24.2 Release Readiness Webinar
Read all our news at the Operations Bridge blog.
Have technical questions about Operations Bridge? Visit the Operations Bridge User Discussion Forum.
Keep up with the latest Tips & Information about Operations Bridge.
Do you have an idea or Product Enhancement Request about Operations Bridge? Submit it in the Operations Bridge Idea Exchange.