Wikis - Page

Knowledge Doc: HA services in STOPPED status after HA switch

1 Likes

The article will explain how a time difference on the database servers can lead to a negative ping of the HAC services and problem with the HAC switch and assignments.

Environment

Operations Bridge Manager (OBM) 2021.05 

 

Situation

OBM cannot correctly move services from DP primary to secondary.

Every time one moved services used the JMX "service=hac-backup -->>> moveServices" method.

What is happened?

1. OS team patched DP2 (no HAC services online) and GW2

2. move HAC services from DP1(primary) to DP2

3. OS team patched DP1 and GW1

4. move HAC services from DP2(primary) to DP1

5. At this point DP1 gets HAC resources but following were in STOPPED:

BIZ_IMPACT

DASHBOARD

OPR

LIV_SERVICE

6. Tried to move back HAC services to DP2 but the behavior was the same.

7. Moved services again on DP1.

Only some services has been moved, so some services started on DP1 and other on DP2. in this case all services were running and OBM worked.

8. Tried move services multiple time in order to get all services in one DP, but every time the following services were in STOPPED:

BIZ_IMPACT

DASHBOARD

OPR

LIV_SERVICE

Noted strange information in jmx page 

And noted a strange negative ping time on those services 

Then tried (without success) to delete HA_ tables content using this procedure:

- Stop everything (run_hpbsm stopall)

- Make sure the db is started

- Clear HA tables:

delete from HA_ACTIVE_SESS;

delete from HA_BACKUP_PROCESSES;

delete from HA_PROC_ALWD_SERVICES;

delete from HA_PROCESSES;

delete from HA_SRV_ALLWD_GRPS;

delete from HA_SERVICES_DEP;

delete from HA_SERVICES;

delete from HA_SERVICE_GRPS;

delete from HA_TASKS;

delete from HA_SERVERS;

- Run config wizard

- Start OBM

Tried also to re-run config wizard to add memory to Heap Size on every processes but nothing changed.

Then forced a rebuild views tables using following JMX Methods:

Mbean: UCMDB:service=DAL services. Method: rebuildModelViews

Mbean: UCMDB:se rvice=DAL services. Method: rebuildModelDBSchemaAndView

And then start DP1 only... not worked, the same services above were in stopped.

Finally tried to moved services to DP2 and then it's worked. Now all processes are working on DP2.

This time that the ping time on those services has been cleared (previously was in negative time, every time).

Cause

This was am issue on ntpd service on some DB hosts that get a wrong time.

This is the root cause of our problem.

For resolution, read the complete knowledge article

Labels:

Support Tips/Knowledge Docs
Comment List
Related
Recommended