Delayed Events in OBM 2023.05 Event Browser

Hello experts,

We are running OBM 2023.05 with an HA architecture, and the latest hotfix installed is OBM202305HF20240207T092715Z. Our Operations Team, who monitors the Event Browser 24/7 for Critical and Major events, has reported that some events appear in the console with a delay. The events always display the correct "time created" and "time received" timestamps, but they show up in the console with an age of already 15 or 20 minutes.

We found this very odd and had trouble finding any evidence of this issue. However, we discovered something interesting when looking into the integration with OO. We use Automatic Runbook Executions to integrate with OO (2023.05), which is triggered for many events in the production environment. In the cases where we observed the described delay, the execution was triggered much later than expected. This seems to corroborate the Operations Team's observations, as the OO integration should trigger as soon as the event is created.

For example, if an event has timestamps like this:
8/10/2024 11:13:25 PM
8/10/2024 11:13:26 PM
8/10/2024 11:13:26 PM
8/11/2024 01:03:26 AM
8/11/2024 01:03:26 AM

The first entry displayed in the history tab would be something like this:
System (System) - 8/10/24, 11:29 PM Added Annotation with valueFlow "OBM Notifications" launched in Operations Orchestration. See <https://$oo_server:$port/oo/drilldown-ex.html#$flow_id>

This issue occurs sporadically, but we have no idea how to troubleshoot it further. Could you advise on which logs to check, so we can increase the log file size and space allocation, or suggest any tests we could perform, bearing in mind that this is a production environment?

Thank you in advance.

  • 0

    Could you check in the normal event and anomalous event  details  if there is a difference in history  and  event forwarding. 

  • 0 in reply to 
  • 0   in reply to 

    Hello,

    When dealing with slow event processing, I always enable flowtrace – which you can do in the infrasctucture settings:

    Operations Management - Event Synchronization Settings > Event Flow Logging Mode and set to "file"

    Then you can look to see each pipeline step (column 8) and how the event flows through the pipeline.  This gives you the time, event ID, the pipeline step and the bulk size.  This is also with the understanding that everything until IMDBStore happens in a pipeline, whereas after that steps are in performed in parallel – looks like this:

     

    # grep b9568242-5f8f-71ef-19d4-0a5e4a350000 /opt/HP/BSM/log/opr-backend/opr-flowtrace-backend.log

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: KPIStatusChangeHandler. Bulk events size: 1

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: SameEventFilter. Bulk events size: 1

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: InitEvents. Bulk events size: 1

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: EventReceiver: EventReceiver: received event with state: OPEN, severity: MINOR, title: FileSystem space utilization for Logical Disk /dev of type devtmpfs - Minor threshold exceeded.

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: SequenceEvents. Bulk events size: 1

    2024-08-21 09:33:56,704 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: LogReceivedStatistics. Bulk events size: 1

    2024-08-21 09:33:56,705 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: RegisterEvents. Bulk events size: 1

    2024-08-21 09:33:56,705 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EventUpdateHandler. Bulk events size: 1

    2024-08-21 09:33:56,705 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: ActionResponseHandler. Bulk events size: 1

    2024-08-21 09:33:56,706 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: PipelineEntry. Bulk events size: 1

    2024-08-21 09:33:56,706 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: AutoCorrelation. Bulk events size: 1

    2024-08-21 09:33:56,706 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: CIResolver. Bulk events size: 1

    2024-08-21 09:33:56,707 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: NodeCiGenerator. Bulk events size: 1

    2024-08-21 09:33:56,707 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: CiVariableReplacer. Bulk events size: 1

    2024-08-21 09:33:56,707 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EtiResolverByHint. Bulk events size: 1

    2024-08-21 09:33:56,707 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: DowntimeProvider. Bulk events size: 1

    2024-08-21 09:33:56,707 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EtiResolverByRule. Bulk events size: 1

    2024-08-21 09:33:56,708 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: ResolutionCompleted. Bulk events size: 1

    2024-08-21 09:33:56,708 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ExternalEventEnrichment: Started: EPI-Executor-1 script: 4d9120c0-f331-44a3-8252-965f6b96bdcd

    2024-08-21 09:33:56,755 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EventSuppression. Bulk events size: 1

    2024-08-21 09:33:56,755 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: IMDBStore. Bulk events size: 1

    2024-08-21 09:33:56,755 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: KPIStatusChangeHandler. Bulk events size: 1

    2024-08-21 09:33:56,755 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EventStreamCorrelation. Bulk events size: 1

    2024-08-21 09:33:56,755 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: SameEventFilter. Bulk events size: 1

    2024-08-21 09:33:56,756 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: PairwiseCorrelation. Bulk events size: 1

    2024-08-21 09:33:56,756 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: PairwiseCorrelation: Processing step: PairwiseCorrelation. Started: Pairwise-Worker-2 Key Pattern: ^Sys_FileSystemUtilizationMonitor:puegcsvm43794.swinfra.net:/dev:<*> ETI: fe9b8df4-dd6c-4543-aea3-d06aed3526b4:6768dbcc-2eee-400e-8497-a1b23ff66fc3

    2024-08-21 09:33:56,774 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: PairwiseCorrelation: Processing step: PairwiseCorrelation. Finished: Pairwise-Worker-2

    2024-08-21 09:33:56,785 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: Deduplication. Bulk events size: 1

    2024-08-21 09:33:56,786 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: EventDeduplicator: Processing step: Deduplication. Started: Dedup-Worker-7 Key: Sys_FileSystemUtilizationMonitor:puegcsvm43794.swinfra.net:/dev:START:DiskUsageLevel:Minor ETI: fe9b8df4-dd6c-4543-aea3-d06aed3526b4:6768dbcc-2eee-400e-8497-a1b23ff66fc3

    2024-08-21 09:33:56,786 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: EventDeduplicator: Processing step: Deduplication. Finished: Dedup-Worker-7

    2024-08-21 09:33:56,787 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: PriorityResolver. Bulk events size: 1

    2024-08-21 09:33:56,805 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: KPIStatusChangeHandler. Bulk events size: 1

    2024-08-21 09:33:56,805 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: SameEventFilter. Bulk events size: 1

    2024-08-21 09:33:56,829 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: UserGroupResolver. Bulk events size: 1

    2024-08-21 09:33:56,829 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: PipelineExit. Bulk events size: 1

    2024-08-21 09:33:56,830 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: ImdbUpdater. Bulk events size: 1

    2024-08-21 09:33:56,830 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: EventStore. Bulk events size: 1

    2024-08-21 09:33:56,846 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: EventUpdater: Event committed. Events bulk size: 1 EventUpdate-Worker-3

    2024-08-21 09:33:56,850 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: UnregisterEvents. Bulk events size: 1

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: HIUpdater. Bulk events size: 1

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: NewEventsHandler. Bulk events size: 1

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: TopoCorrelator: New event. For step: TopoCorrelation

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: PostStoreEpiCaller: Processing step: com.hp.opr.common.pipeline.StepInfo$InternalStepInfo@701646fe

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: NNMiCorrelator: NNMiCorrelator executed.

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: EventAutomation: Automation [New Event]

    2024-08-21 09:33:56,851 INFO  [FlowTrace] b9568242-5f8f-71ef-19d4-0a5e4a350000:opr-backend: ProcessingTask: Step: AcknowledgeEvents. Bulk events size: 1

     

    …in parallel you see when you run /opt/HP/BSM/opr/support/opr-jmsUtil.sh.  Parallel refers to the “delivering” column and the respective queue/topic:

     

    queue                                     |      total |   buffered | delivering |     memory | pages | consumers

    -----------------------------------------------------------------------------------------------------------------

    opr_event_forward_queue                   |    1221233 |      7222  |       1202 |        121 |     1 |         2..

     

    /opt/HP/BSM/log/opr-backend/opr-flowtrace-backend.log is my favourite log file on an OBM system as it gives you facts about specific events which you can look up to find out what the problem is.  Trace GUI gives you a great overview in the events section, but I also like to use this unsupported PowerShell script to look for gaps in events which indicate a slow step:

    param (
        [string]$logDirectoryPath,
        [string]$outputFilePath = $null,  # Default output file path is null
        [switch]$append  # Flag to indicate whether to append to the output file
    )
    
    if (-not (Test-Path -Path $logDirectoryPath -PathType Container)) {
        Write-Host "Error: Log directory not found at '$logDirectoryPath'. Exiting script."
        exit
    }
    
    # Get all log files in the directory matching the pattern opr-flowtrace-backend.log.*
    $logFiles = Get-ChildItem -Path $logDirectoryPath -Filter "opr-flowtrace-backend.log.*" | Sort-Object -Property LastWriteTime
    
    foreach ($logFile in $logFiles) {
        $stepStartTimes = @{}
    
        Get-Content -Path $logFile.FullName | ForEach-Object {
            # Extract timestamp, UUID, and step information from the log line
            if ($_ -match '^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}).*Step: (.+?)\. Bulk events size: (\d+)?') {
                $logTimestamp = [datetime]::ParseExact($matches[1], "yyyy-MM-dd HH:mm:ss,fff", $null)
                $step = $matches[2]
                $bulkSize = $matches[3]
    
                # Extract UUID from the log line
                if ($_ -match '\[FlowTrace\] ([\w-]+):opr-backend') {
                    $uuid = $matches[1]
                }
                else {
                    $uuid = "UUID not found"
                }
    
                # Calculate the time difference between steps
                if ($previousStepTime -ne $null) {
                    $timeDifference = $logTimestamp - $previousStepTime
                    $formattedTimeDifference = "{0:hh\:mm\:ss\.fff}" -f $timeDifference
                    $logTimestampFormatted = $logTimestamp.ToString("HH:mm:ss,fff")
                    if ($timeDifference.TotalSeconds -gt 5) {
                        $output = "TD = $formattedTimeDifference Time: $logTimestampFormatted UUID: $uuid, Step: $step, Bulk size: $bulkSize (Time difference > 5 seconds)"
                    } else {
                        $output = "TD = $formattedTimeDifference Time: $logTimestampFormatted UUID: $uuid, Step: $step, Bulk size: $bulkSize"
                    }
    
                    # Output to console or file based on the provided outputFilePath
                    if ($outputFilePath -eq $null) {
                        Write-Host $output
                    } else {
                        if ($append) {
                            $output | Out-File -FilePath $outputFilePath -Append
                        } else {
                            $output | Out-File -FilePath $outputFilePath
                        }
                    }
                }
    
                $previousStepTime = $logTimestamp
            }
        }
    }

    opr-jmsUtil.sh

    Next, please run opr-jmsUtil.sh.  This command shows most of the queues and topics that OBM uses.  Please can you let me know if the queue called opr_gateway_queue has lots of "buffered events".  Lots of events is more than 5000.

    # /opt/HP/BSM/opr/support/opr-jmsUtil.sh
    Live Server: serverA
    Last Alive: Mon Aug 26 09:24:33 CEST 2024 (9 seconds ago)
    ==============================================================================================================================

    queue | total | buffered | delivering | memory | pages | consumers
    -----------------------------------------------------------------------------------------------------------------
    recipient_notification | 0 | 0 | 0 | 0 | 0 | 1
    queue/alert_engine_alert | 0 | 0 | 0 | 0 | 0 | 1
    opr_event_forward_queue | 0 | 0 | 0 | 0 | 0 | 0
    opr_action_launch_queue | 0 | 0 | 0 | 0 | 0 | 1
    queue/alert_engine_notification | 0 | 0 | 0 | 0 | 0 | 1
    opr_gateway_queue | 11752 | 1232 | 0 | 0 | 0 | 1
    failed_recipient_notification | 0 | 0 | 0 | 0 | 0 | 1

    topic | total | buffered | delivering | memory | pages | dur-subs | nondur-subs
    ------------------------------------------------------------------------------------------------------------------------------
    opr_content_autoupload_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    opr_marble_calc_result_topic | 3453 | 0 | 0 | 2887 | 0 | 0 | 2
    opr_config_store_update_event_topic | 15 | 0 | 0 | 0 | 0 | 0 | 1
    opr_ha_backend_sync_topic | 0 | 0 | 0 | 983 | 0 | 0 | 0
    activemq.notifications | 0 | 0 | 0 | 0 | 0 | 0 | 0
    opr_event_flowtrace_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    DTNotification | 0 | 0 | 0 | 0 | 0 | 0 | 1
    opr_ha_backend_full_sync_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    opr_healthcheck_heartbeat_topic | 9768 | 0 | 0 | 0 | 0 | 1 | 0
    opr_oas_rtsm_view_permission_change_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    IMS.customer_1 | 15 | 0 | 0 | 0 | 0 | 0 | 3
    opr_scripting_host_topic | 0 | 0 | 0 | 0 | 0 | 0 | 0
    opr_action_responses | 0 | 0 | 0 | 0 | 0 | 0 | 1
    topic/repositories_change | 0 | 0 | 0 | 0 | 0 | 0 | 6
    opr_store_update_event_topic | 48482 | 0 | 0 | 0 | 0 | 0 | 9
    opr_store_reinit_topic | 0 | 0 | 0 | 0 | 0 | 0 | 2
    opr_global_store_update_event_topic | 13 | 0 | 0 | 0 | 0 | 0 | 4
    opr_reconciliation_change_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    opr_event_sync_topic | 0 | 0 | 0 | 0 | 0 | 0 | 1
    Notification | 115 | 0 | 0 | 0 | 0 | 0 | 7
    ESS_Marble_Topic_1 | 5548 | 0 | 0 | 0 | 0 | 1 | 0
    opr_toposync_summary_info_topic | 18 | 0 | 0 | 0 | 0 | 0 | 1
    opr_content_uploaded_topic | 0 | 0 | 0 | 0 | 0 | 0 | 0
    GlobalIdJMSTopic | 0 | 0 | 0 | 0 | 0 | 0 | 2

    topic | subscriber | buffered | delivering | durable
    -------------------------------------------------------------------------------------------------------------
    opr_healthcheck_heartbeat_topic | DURABLE_HEALTH_TOPIC_LISTENER | 0 | 0 | Y
    ESS_Marble_Topic_1 | MarbleEssTopicSub_1 | 0 | 0 | Y

    Columns are defined as:

    • queue/topic/subscriber: name of queue/topic/subscriber. 
    • total: total number of messages sent to a destination since the bus server process has been started.
    • buffered: currently buffered messages.  If there is a positive integer (>1000) here one of the receiver/consumers may have stopped processing messages.
      • Once again, this should help find which log file to look under /opt/HP/BSM/log.
    • delivering: subset of buffered messages, which are currently being processed by consumer.  This means the message is being processed. 
      • If this value is zero and buffered is high, buffering value is zero one of the receiver/consumers have failed.
      • Once again, this should help find which log file to look under /opt/HP/BSM/log.
    • memory: number of bytes currently allocated in memory.
    • pages: number of page files (0 if not paging).
    • consumers: number of connected queue readers.
    • dur-subs / nondur-subs: number of durable / non-durable topic subscriptions.
    • durable: Y indicates a durable subscription.  N indicates a non-durable subscriptions:
      • Durable messages are stored on disk or other persistent storage (/opt/HP/BSM/bus).
      • Non-durable messages are stored only in memory or transient storage. If the messaging system crashes or restarts, these messages are lost.  OBM is designed to work with non-durable messages.

    EventStatisticsUtil

    Finally, please run /opt/HP/BSM/opr/support/EventStatisticsUtil.sh and check https://portal.microfocus.com/s/article/KM000012187?language=en_US and https://portal.microfocus.com/s/article/KM000012198?language=en_US.  This shows you relationships between all the relevant event attributes.  This is key to understanding and resolving pipeline issues.

    I hope this goes someway to helping finding the cause of your slow events.  If you get stuck, then I would recommend opening a service request with Opentext support.

    Thanks,

  • Suggested Answer

    0  

    Hello,

    How did you get on with this?  Did the troubleshooting steps below help?  If not, then please contact Opentext Support.