Hello,
I launch a deployment of HCM 2020.05 on a centos 7.7 server that I will named "master server".
My installation is a non production environment that will be used as a training lab environment and is composed of :
- one master sever (a unique master)
- one worker serer
- one vertica server
- one postgresql server (external DB)
At the step of deployment HCM (on the doc it refers to this page : https://docs.microfocus.com/doc/Hybrid_Cloud_Management/2020.05/DeploySuite), I got a timeout error for some components as in the below capture:
Indeed, some pods were not running after deployment :
root@hcm-master-1 ~]# kubectl get pods -n hcm-n0waq
NAME READY STATUS RESTARTS AGE
broker-0 2/2 Running 0 16h
hcm-accounts-6b4b4998b8-jmqkq 0/2 Init:2/3 0 16h
hcm-ara-85dd9bf988-tpdcq 1/2 ErrImagePull 0 16h
hcm-autopass-5cfcb69875-9dd8z 2/2 Running 1 16h
hcm-cloudsearch-758fb49fdf-kj9g2 2/2 Running 0 16h
hcm-co-optimizer-7567549969-t6f9p 1/2 Running 36 14h
hcm-composer-59d5f5c4dd-sxm7s 0/2 Init:1/2 0 14h
hcm-composer-gateway-57b464f5c8-vtv94 0/2 Init:1/2 0 16h
hcm-content-fnsqk 0/1 Init:2/4 0 16h
hcm-content-tools-gqwxh 0/1 Completed 0 16h
hcm-coso-config-data-xrjk6 0/1 Completed 0 16h
hcm-costpolicy-b6f97c65-vw247 0/2 Init:3/6 0 16h
hcm-csa-75487f85bf-dxq5f 1/2 ErrImagePull 1 16h
hcm-csa-collector-6965565f4d-z8bs5 0/2 Init:1/3 0 16h
hcm-elasticsearch-75f45bc7db-642sz 2/2 Running 0 16h
hcm-idm-config-data-pg5kq 0/1 Completed 0 16h
hcm-image-catalog-7b544bb886-nqcvf 2/2 Running 0 14h
hcm-integration-gateway-c9d5f9c86-5vt65 0/2 Init:1/2 0 16h
hcm-itom-di-dp-master-dpl-6565b44b-5xvcs 2/2 Running 2 16h
hcm-mpp-5467f84487-gr2p8 0/2 Init:2/3 0 16h
hcm-nginx-ingress-controller-94fxz 1/1 Running 0 16h
hcm-oo-585bffff9f-6qp4g 2/2 Running 0 16h
hcm-oodesigner-7cbfd8d9df-9kwql 1/2 ErrImagePull 1 16h
hcm-policy-gateway-6ff9cb8f45-dgdg9 0/2 Init:1/2 0 16h
hcm-scheduler-6cdcb46688-q8fgr 2/2 Running 1 16h
hcm-showback-7d5b76d688-nmmkl 0/2 Init:4/5 0 16h
hcm-showback-gateway-7b44f558c4-qnxzb 0/2 Init:1/2 0 16h
hcm-ucmdb-544cbf8cd8-lkblb 1/2 ErrImagePull 1 16h
hcm-ucmdb-browser-56fbbdb89d-fmvng 0/2 Init:1/2 0 16h
hcm-ucmdb-probe-7795b5bfdd-t6fvb 0/2 Pending 0 16h
itom-di-administration-77c4dfc79-46zzv 2/2 Running 1 16h
itom-di-dp-job-submitter-dpl-54d54f9798-cmq2q 2/2 Running 0 16h
itom-di-dp-worker-dpl-c67997845-2zm9j 2/2 Running 2 16h
itom-di-receiver-dpl-db984c989-wtj9q 2/2 Running 0 16h
itom-di-vertica-ingestion-d547dbf86-xpn7r 1/2 Running 100 16h
itom-di-zk-dpl-0 1/1 Running 0 16h
I try to restart kube on master and worker, but that does not change the Issue of pod status.I reboot server master because a process was stuck.
After that, some pods were running as expected like csa, but other still not like mpp :
[root@hcm-master-1 ~]# kubectl get pods -n hcm-n0waq
NAME READY STATUS RESTARTS AGE
broker-0 2/2 Running 0 17h
hcm-accounts-6b4b4998b8-nkfvm 2/2 Running 4 17h
hcm-ara-85dd9bf988-j25f6 2/2 Running 0 17h
hcm-autopass-5cfcb69875-ghcfp 2/2 Running 0 18h
hcm-cloudsearch-758fb49fdf-hjxjh 2/2 Running 0 17h
hcm-co-optimizer-7567549969-4s2k5 1/2 ErrImagePull 0 18h
hcm-composer-59d5f5c4dd-8xdsr 2/2 Running 1 17h
hcm-composer-gateway-57b464f5c8-dtlwl 2/2 Running 20 17h
hcm-content-tools-gqwxh 0/1 Completed 0 4d16h
hcm-costpolicy-b6f97c65-qhmw4 1/2 Running 73 17h
hcm-csa-75487f85bf-2wmlg 2/2 Running 0 18h
hcm-csa-collector-6965565f4d-pnqm8 2/2 Running 5 17h
hcm-elasticsearch-75f45bc7db-nqkfc 2/2 Running 0 17h
hcm-image-catalog-7b544bb886-97mqv 2/2 Running 7 17h
hcm-integration-gateway-c9d5f9c86-j75x6 2/2 Running 2 17h
hcm-itom-di-dp-master-dpl-6565b44b-chnkm 2/2 Running 0 17h
hcm-mpp-5467f84487-gr2p8 1/2 Running 123 4d16h
hcm-nginx-ingress-controller-94fxz 1/1 Running 2 4d17h
hcm-oo-585bffff9f-6qp4g 2/2 Running 5 4d16h
hcm-oodesigner-7cbfd8d9df-s4l2j 2/2 Running 0 17h
hcm-policy-gateway-6ff9cb8f45-kbdpg 0/2 Init:1/2 0 17h
hcm-scheduler-6cdcb46688-9cs6h 2/2 Running 0 17h
hcm-showback-7d5b76d688-nmmkl 2/2 Running 1 4d16h
hcm-showback-gateway-7b44f558c4-z2tfx 2/2 Running 6 17h
hcm-ucmdb-544cbf8cd8-nxhcp 2/2 Running 0 17h
hcm-ucmdb-browser-56fbbdb89d-t2ql4 2/2 Running 3 17h
hcm-ucmdb-probe-7795b5bfdd-vzlnm 0/2 Pending 0 17h
itom-di-administration-77c4dfc79-sfb8n 2/2 Running 2 17h
itom-di-dp-job-submitter-dpl-54d54f9798-bffzs 2/2 Running 0 17h
itom-di-dp-worker-dpl-c67997845-hvcl7 2/2 Running 2 17h
itom-di-receiver-dpl-db984c989-sfpx8 2/2 Running 0 17h
itom-di-vertica-ingestion-d547dbf86-ggwfp 1/2 Running 7 17h
itom-di-zk-dpl-0 1/1 Running 0 17h
[root@hcm-master-1 ~]# /opt/kubernetes/bin/kube-status.sh
Server certificate expiration date: Sep 10 13:01:18 2024 GMT, 357 days left
Get Node IP addresses ...
Master servers: hcm-master-1.xxx.fr
Worker servers: hcm-master-1.xxx.fr hcm-worker-1.xxx.fr
Checking status on 10.X.Y.40
--------------------------------------
Local services status:
[DockerVersion] Docker:v19.03.5 ...................................... Running
[DockerStorageFree] docker ........................................... 11.191 GB
[KubeVersion] Client:v1.15.5 Server:v1.15.5 ......................... Running
[NativeService] docker ............................................... Running
[NativeService] kubelet .............................................. Running
[NativeService] kube-proxy ........................................... Running
Cluster services status:
[APIServer] API Server - hcm-master-1.xxx.fr:8443 ........ Running
[MngPortal] URL: hcm-master-1.xxx.fr:5443 ................ Running
[Node]
(Master) hcm-master-1.xxx.fr ................................ Running
(Worker) hcm-master-1.xxx.fr ................................ Running
(Worker) hcm-worker-1.xxx.fr ................................ Running
[Pod]
<hcm-master-1.xxx.fr>
(kube-system) apiserver-hcm-master-1.xxx.fr ................. Running
(kube-system) controller-hcm-master-1.xxx.fr ................ Running
(kube-system) scheduler-hcm-master-1.xxx.fr ................. Running
(kube-system) etcd-hcm-master-1.xxx.fr ...................... Running
<hcm-worker-1.xxx.fr>
[DaemonSet]
(kube-system) coredns ........................................... 1/1
(kube-system) kube-flannel-ds-amd64 ............................. 2/2
(core) kube-registry ............................................ 1/1
(core) fluentd .................................................. 2/2
(core) itom-logrotate ........................................... 2/2
[Deployment]
(kube-system) heapster-apiserver ................................ 1/1
(core) metrics-server ........................................... 1/1
(core) itom-cdf-tiller .......................................... 1/1
(core) itom-logrotate-deployment ................................ 1/1
(core) idm ...................................................... 2/2
(core) mng-portal ............................................... 1/1
(core) cdf-apiserver ............................................ 1/1
(core) suite-installer-frontend ................................. 1/1
(core) nginx-ingress-controller ................................. 2/2
(core) itom-cdf-ingress-frontend ................................ 2/2
[Service]
(default) kubernetes ............................................ Running
(kube-system) heapster .......................................... Running
(core) idm-svc .................................................. Running
(core) kube-dns ................................................. Running
(core) kube-registry ............................................ Running
(core) kubernetes-vault ......................................... Running
(core) mng-portal ............................................... Running
(core) suite-installer-svc ...................................... Running
(core) cdf-svc .................................................. Running
(core) cdf-suitefrontend-svc .................................... Running
(core) nginx-ingress-controller-svc ............................. Running
(core) itom-cdf-ingress-frontend-svc ............................ Running
(core) metrics-server ........................................... Running
[NFS]
<PersistentVolume: hcm-n0waq-db-backup-vol>
hcm-master-1.xxx.fr:/var/vols/itom/db-backup ................ Passed
<PersistentVolume: hcm-n0waq-hcm-vol-claim>
hcm-master-1.xxx.fr:/var/vols/itom/hcm ...................... Passed
<PersistentVolume: itom-logging>
hcm-master-1.xxx.fr:/var/vols/itom/logs ..................... Passed
<PersistentVolume: itom-vol>
10.X.Y.40:/var/vols/itom/core ............................... Passed
[DB] cdfidmdb ........................................................ Passed
Full CDF is Running
My question is how to know where to start searching components that first failed.I know the command to investigate on the error for some pods not running, as for example :
# kubectl logs cdf-apiserver-7c4fbc8b7c-92fxg -n core -c cdf-apiserver
But I would like to kniow if there is a log file that describe a chronology of the events like beginning to deploy csa pod, then mpp pod, then oo pod, ...etc.Does this file exist somewhere (I want to know the first pod that were in error during deployment to fix this error as there may be some dependences between pod installation).
Also, I wanted to uninstall HCM to install again from portal management, but login retun "username/password" error while credentials did not change since the last timeI try to deploy HCM.
Is it possible to change credential of management portal to uninstall properly ?
Thanks in advance for your help !
Regards,
Jean-Philippe