13 min read time

What’s new in 25.1 OpenText™︎ Core Application Observability

by   in IT Operations Cloud

We are excited to announce the new release of OpenText Core Application Observability. This release includes features that enhance application monitoring for robustness and enable root cause analysis of performance issues with integration of synthetic monitoring and OpenText Performance Engineering (LoadRunner Professional). Additionally, it brings in metrics aggregation, enhanced log management, and enhancements for integration with OpenText Core Infrastructure Observability and OpenText Core Cloud Network Observability.

What’s new in OpenText Core Application Observability:

  • Synthetic Monitoring with Application Observability
  • Integration with Load Runner Professional for application performance root cause analysis
  • Trace enhancement - aggregate important traces into metrics for longer-term retention
  • OpenTelemetry Collector’s File Log Receiver - To collect logs from any non-instrumented application
  • Log filters, search and saved views
  • Enhancements - Integration with Hyperscale Observability & Cloud Network Observability
  • SLM over Application Observability metrics

Integration with Synthetic Monitoring for Proactive Issue Resolution

Synthetic monitoring (BPM) collects and analyzes application response time and availability from various locations by simulating user transaction

Synthetic monitoring provides a web-based transaction breakdown reports that primarily measures transaction response times broken down by retry times, DNS resolution, connection time, network time for first buffer, server time to first buffer, download time and client time. Synthetic monitoring allows us to run the scripts against the applications that we want to monitor and analyze the application response times and availability from various locations.

You can now enhance root cause analysis by using synthetic monitoring with OpenText AI Operations Core Application Observability.

This ensures proactive application problem resolution by

  • Correlating Synthetic Transactions to OpenTelemetry Traces: This correlation provides a comprehensive view of system performance, linking user interactions with backend processes. By using probes that operate 24/7, you can monitor your applications around the clock. These probes link directly to the trace in the problem, enabling fast root cause analysis. This proactive approach allows you to address issues before they escalate, ensuring a seamless user experience.
  • Diving Deeper into System Execution: By using instrumented probes, you can clearly separate synthetic actions from real user actions. This distinction helps you focus on genuine user issues, rather than being sidetracked by synthetic test results.  By examining detailed execution traces, it becomes easier to identify bottlenecks and anomalies, leading to faster resolution times.

You can use this functionality to monitor synthetic transactions triggered by Business Process Monitor (BPM). Synthetic Monitoring allows you to manage applications, scripts, and transactions executed on BPM probes, without using Application Performance Management (APM).

Synthetic Monitoring offers several out-of-the-box reports that you can use to identify issues with application performance and availability. You can then use the Trace Explorer to do root cause analysis of the issues detected by Synthetic Monitoring. When BPM detects a slow response time for a synthetic transaction, you can use Core Application Observability to find the root cause. This generates synthetic test results application traces, logs, metrics, all of these are available in the Core Application Observability. The status overtime report that is generated for the BPM Applications now includes a "View Traces" link that opens the trace explorer, where transaction details are available. By analyzing the generated traces and logs, you can pinpoint the exact source of the slowdown, whether it's a specific service, database query, or network latency

Figure 1: View traces from BPM in Traces Explorer of AI Operations Management Core Application Observability

Comparing Synthetic and Non-Synthetic Traces

By effectively distinguishing between synthetic and non-synthetic traces and providing powerful filtering options, the Trace Explorer clearly distinguishing between these two types of traces, making them easily identifiable for users.

Synthetic traces, which are generated by synthetic monitoring tests, and non-synthetic traces (the OpenTelemetry traces), which originate from real user interactions, are both displayed in the Trace Explorer. This clear visibility allows users to quickly identify the source of each trace, facilitating a more streamlined analysis process.

Operator can apply specific filters to isolate synthetic traces, enabling them to focus on the performance and reliability of their synthetic monitoring setups. 

In the Trace Group view, for enhanced visibility and ease of use, the Trace Explorer also offers a toggle feature. This allows users to switch between synthetic and non-synthetic traces, errors and no-errors seamlessly. By toggling between these traces, users can gain a comprehensive view of their system's performance, ensuring that no critical insights are missed.

Figure 2: Synthetic Vs Non- Synthetic Traces

For more information, see Configure Synthetic MonitoringMonitor synthetic transactions in Application Observability, and View Synthetic Monitoring Flex reports

Effective performance RCA – Integration of OpenText Performance Engineering (LoadRunner Professional)

 OpenText Performance Engineering is a comprehensive platform designed for load, stress, and performance testing. This enables us to identify and resolve performance bottlenecks, ensuring a seamless user experience which is crucial for any fast-paced IT digital landscape.

Now we can integrate OpenText Performance Engineering tool LoadRunner Professional with Core Application Observability with this integration, we can now capture detailed traces, metrics, and logs from the application under test and additionally, the distributed trace spans generated by LoadRunner Professional during load tests are seamlessly sent to the Core Application Observability Service. By leveraging this integration, we can gain insights into the application’s performance and thus enabling a better root cause analysis of any performance bottlenecks. 

Figure 3: LoadRunner traces in the trace explorer

 For more information, see Integrate with OpenText Performance Engineering (LoadRunner Professional)

Transforming Traces – Raw data to metrics

Traces are specific to each application, with no standard set of traces or attributes due to their multidimensional nature. To address this, you are given the flexibility to aggregate data that is meaningful to your specific use cases.

With the introduction of trace to metric data, i.e errors (error count), latencies (duration of trace completion time), and traffic (requests). These metrics are displayed as trends in the Trace Explorer. The trends are derived by aggregating the trace metrics and are available for up to last 12 hours only.

If you need historical metric comparison, you can create views and save views and enable data aggregation based on filters to see the trace trends. This aggregated data is retained for a duration of 12 months. Additionally, you can create customized dashboards from the aggregated data as this is stored on the ITOM Operations Cloud (OPTIC) data lake, enabling a more comprehensive and actionable view of your operations.

Figure 4: Enhanced Trace Explorer with trend charts

Figure 5: Historical Trends view for saved traces

For more information check documentation for trace trends, creating trace views and historical aggregation

OpenTelemetry Collector’s Filelog Receiver - To collect logs from any non-instrumented application

There are two ways to collect logs from an application. One is by instrumenting the application and the other is by configuring the OpenTelemetry’s Filelog receiver to collect logs from specific file paths and transmit them to the Core Application Observability.

Now you can enable log collection from a non-instrumented application using the OpenTelemetry’s Filelog receiver. This receiver can be deployed in various environments, including bare metal systems, virtual machines, Docker containers, and Kubernetes clusters. It allows you to specify which file paths to monitor, the file paths from which logs should be collected. This can include application logs, server logs, and other relevant log files. In Kubernetes, it can be deployed as a SideCar container or DaemonSet, ensuring comprehensive log collection across your infrastructure.

Once we have the logs collected, they are scraped by regular expression and converted to the OpenTelemetry format. The formatted logs are then exported to the Core Application Observability service for further analysis and monitoring.

For more information, see Configure Filelog Receiver.

Log explorer Enhancements

Now you can search logs based on filters and an enhanced search function, sort the logs based on time and severity and save meaningful filter combinations for logs, as views.

Filters

To fetch relevant logs, use the Filter option. Select an attribute and its corresponding value to narrow down the logs. For instance, to view logs for a specific container, choose "Container" as the attribute and enter the container name as the value. To refine the results further, click "Add Condition" and specify additional attributes. Only logs that meet all specified conditions will be displayed.

Search box

After retrieving the required logs, refine them using the Search box for quick text searches. The search function applies to fields like body, exception.message, exception.stacktrace, and exception.type. Application Observability's advanced search breaks compound words into separate words for indexing. For example, to find logs for getCartAsync, you can search for "Cart" or "Async". Use both Filter and Search to quickly find the logs you need.

Each log record shows the message, severity, and time. Click a log record for more details, such as application, spanID, or traceID, in the Log detail pane. Sort records by time and severity to prioritize logs.

Save log views

Save frequently used filters and search text as a log view for quick retrieval of log records. You can modify the search criteria in the Filter section or Search box to find relevant logs.

Figure 6: Log explorer

Enhancements - Integration with Hyperscale Observability and Cloud Network Observability

Application Observability integrated with Infrastructure Observability and Cloud Network Observability aims to provide root cause analysis of application problems associated with infrastructure and network health.

The trace details now captures the important system and network metrics to indicate possible infrastructure health issues. In case these indicate problems, you can cross launch to Hyperscale Observability and Cloud Network Observability dashboards for in depth analysis.

Figure 7: Trace details display key system metrics

For more information on Infrastructure Observability integration, see Integrate with Infrastructure Observability.

Figure 8: Trace details display key network metrics

For more information on Cloud Network Observability integration, see Integrate with Cloud Network Observability.

Service Level Management

It is very crucial to maintain the reliability and performance of IT infrastructure and applications.  This is where Service Level Management (SLM) comes into play. SLM enables organizations to define Service Level Objectives (SLOs), which are essential for tracking and maintaining service reliability.

Service Level Objectives (SLOs) can be utilized by various user personas, including Site Reliability Engineers (SREs), application owners, and operations teams, to set measurable targets ensuring users receive committed service levels, measure the health and performance of critical services, maintain infrastructure stability, manage resources to meet SLO targets, and optimize system performance to improve service reliability.

SLOs contain Service Level Indicators (SLIs), which are the accurate quantitative measures of the user experience. They help you determine whether the availability and performance requirements of applications and infrastructure are being met. SLIs represent a proportion of successful outputs and are expressed as a percentage (%).

An error budget is the maximum amount of time a technical system can fail, without contractual consequences. You can inspect and act on alerts generated for threshold breach violations in SLO error budget.

The various personas that can use the SLOs are:

Site Reliability Engineers (SREs) are primarily responsible for ensuring the reliability and uptime of applications and services. They use Service Level Objectives (SLOs) to ensure that services meet agreed-upon performance and availability targets. Their tasks include monitoring SLOs, responding to incidents, optimizing system performance, and continuously improving service reliability.

Application Owners are primarily responsible for the success of the application or product. They make strategic decisions based on service reliability data and ensure that the application or product meets its service commitments to customers.

Operations Teams are primarily responsible for managing the infrastructure to ensure the smooth running of applications and services. Their tasks include monitoring SLOs to maintain infrastructure stability, managing resources to meet SLO targets, and coordinating incident response and resolution.

Refer to SLM Overview for details on the SLM terms and personas using SLO

An example use case for SLO is described below

If you have an application which is monitored via the Core Application Observability, you can set the SLI for one of the custom metrics say payment for the application. Here, you can set the SLI as pending payments < $5000 per day. The Error budget (the acceptable slippage of the SLI) is 1 day slip per week for the set SLI metric (i.e.) 1 day/week the pending payments can be > $5000 that leave the error budget to 14.29% / week, then the acceptable SLO will be 85.7% payments per week.

Figure 9: Example of SLO for custom Application metric

 You can also set the SLO for application response time. For instance, you might set an SLO where your service responds successfully to at least 98.2% of requests. Here, the SLI is response time < 4 seconds, and the error budget (acceptable error in the SLI) is 3 hours per week, making the error budget value 1.79%

Figure 10: Example of SLO for Application response time

The SLO and SLI metrics can be defined using the configure SLO  

Figure 11: Configure SLO – General settings

Figure 12: Configure SLO –SLI definition

For example, you can specify the NodeCPUAvailability SLO that uses the following as a SLI: the SLI is good if for every 5 minutes, the CPU usage is below 90%, measured using the average over all samples during that time slice.

Figure 13: Sample SLI criteria

Service Level Objectives (SLO) are target values that SLIs must meet over a duration of time. In our example, you could define, for instance, that the SLO is breached when less than 95% availability is achieved over a period of one week.

To be alarmed before the SLO is breached, you can define additional thresholds for an SLO, as shown here:

Figure 14: Specifying additional thresholds

Additionally, in that context, you can look at error budgets (the maximum amount of time that a technical system can fail without contractual consequences).

You can define thresholds for error budget consumption separately, so that you can alarm your operators to pay special attention to the systems that are about to breach their SLOs.

Figure 15: Defining thresholds for error budget consumption

When defining the SLO, you can look at the preview to check whether your SLO definition is up to date and see the breaches it would have calculated for past data.

Figure 16: SLO definition preview

For more information, see Configure Service Level Objectives.

Once you have defined the SLOs, you can view the SLO status using the SLO Summary dashboard and drill down from there into a SLO dashboard for a specific SLO.

Figure 17: A SLO dashboard sample

For more details, see View Service Level Objectives.

 

More 25.1 release-related details are provided in the OpenTextTm AI Operations Management Release Readiness Webinar. The slides and the recording are available on our Community page here.

We encourage you to try out our new features and enhancements! For further information on our offerings, visit the OpenTextTm AI Operations Management product page, explore our documentation resources and check out our video library and blogs.

If you have feedback or suggestions, don’t hesitate to comment on this article below.

Discover the full range of our products by taking a look at these pages on our Practitioner Portal: AI Operations Management - SaaS, Operations Bridge Manager, SiteScope, Operations Agent, Operations Bridge Analytics, Application Performance Management (APM) and Operations Orchestration (OO).

Events

 

Read all our news in the OpenTextTm AI Operations Management blog. 

 

Have technical questions about OpenTextTm AI Operations Management? Visit our User Discussion Forum and keep up with the latest Tips. 

 

If you have an idea or Product Enhancement Request about OpenTextTm AI Operations Management, please submit it in the Idea Exchange.

Labels:

Operations Bridge