Backups Over Fiber Fails, But Over Network Works

Hi,

Just upgraded to version 24.4. We have a tape library (2 drives) that is connected over fiber to a few machines. When we backup over the network, the backup works. When we backup over fiber the backup does not quite start. The library does not put the tape in the drive, etc. The Cell Server is on a Windows Server. The clients are Windows and RHEL. The problems all started after we upgraded. Open to suggestions. Thanks in advance.

EH

  • 0  

    The question is: what is the real issue causing this problem? Maybe you should share some example reports of both the failing and the working session? Could it be that it's always working when bma and uma are running on the same system?

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • 0

    Hi,

    More information. We have been "upgrading" since Omniback 2.0 (2008), and therefore I was not sure if this last upgrade to 24.4 (problem started) from the earlier versions may have introduced an older bug, etc. Therefore, I did a brand new install of Windows 2022, and installed DP 24.4. I zoned the library that has 2 drives for the fiber channel. I autoconfigured the tape library, and it is a multipath device. I backed up the cell manager without selecting the "Use preferred MultiPath host", and the backup worked fine. When I did select the "Use preferred MultiPath host" that worked fine too. My next test was to backup a client. In this case it was a RHEL 7 machine. When I did not select the "Use preferred MultiPath host", the back up worked fine. When I did select it, the backup started, but did not complete. I will attach some screenshots. If one needs to see the logs, please let me know the command, and I will provide.

    Thanks in advance.

    EH

  • 0

    From the Cell Server:

    C:\Users\Administrator>devbra -dev

    Tape HPE:Ultrium 8-SCSI Path: "scsi3:0:2:0C" SN: "CZ294809WC"
    Description: HPE LTO8 Drive
    Revision: Q387 Device type: lto [13] Flags: 0x0101

    Exch HP:MSL G3 Series Path: "scsi3:0:0:1" SN: "DEC91205UG"
    Description: HP StorageWorks MSL 2024 G3 Series
    Revision: 7.90 Flags: 0x0006 Slots: 24 Drives: 2
    Drive(s) SN:
    "CZ20240G50"
    "CZ294809WC"

    Tape HPE:Ultrium 8-SCSI Path: "scsi3:0:0:0C" SN: "CZ20240G50"
    Description: HPE LTO8 Drive
    Revision: Q387 Device type: lto [13] Flags: 0x0101

    Tape HPE:Ultrium 8-SCSI Path: "scsi2:0:2:0C" SN: "CZ294809WC"
    Description: HPE LTO8 Drive
    Revision: Q387 Device type: lto [13] Flags: 0x0101

    Exch HP:MSL G3 Series Path: "scsi2:0:0:1" SN: "DEC91205UG"
    Description: HP StorageWorks MSL 2024 G3 Series
    Revision: 7.90 Flags: 0x0006 Slots: 24 Drives: 2
    Drive(s) SN:
    "CZ20240G50"
    "CZ294809WC"

    Tape HPE:Ultrium 8-SCSI Path: "scsi2:0:0:0C" SN: "CZ20240G50"
    Description: HPE LTO8 Drive
    Revision: Q387 Device type: lto [13] Flags: 0x0101

    C:\Users\Administrator>

  • 0  

    Thanks for sharing this info Emil.

    So I guess my suspicion may be right. We can see that as long as the media agents run on the cell server, everything is fine. What I'm not sure about is whether the problem is related to the platform the MA is running or to a communication problem between bma and uma. I was hoping you could maybe do some additional tests.

    To be sure about the location bma and rma are running (we do not see all the messages in the report in the failing session), I would suggest the following tests. The configs I'm suggesting can be made in parallel to your existing config and can be deleted again afterwards.

    1. Configure the library on the CS and a drive on the client. Do this without multipath anywhere. Run a backup of the client. Does it work?

    2. Configure the library on the client and the drive on the CS, again without multipath. Run the test backup again to this target. Does it work?

    In addition and while you would have the configs already available, you could also test a backup of the CS in both cases. But my assumption is that the source of the backup wouldn't matter.

    If you could share the results, that would be great and would give us an idea where to look deeper.

    Koen

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • 0  

    Oh and by the way, when you have used the "preferred path" before I'm not totally sure this applied to both the device and the library. I'm also not sure both have been configured with multipath or maybe only the devices and not the library. So as a third test I would suggest to configure both the library and a device on the client without multipath and test a backup again.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • 0

    Hi Koen,

    Ran the different tests that you requested, and the results are below. No multipath was used. When the session "hangs" Abort does not work. I have to reboot the machine.

    Library=CS, Drive=Client, Test=Failed

    Library=Client, Drive=CS, Test=Failed

    Library=Client, Drive=Client, Test=Pass

    Library=CS, Drive=CS, Test=Pass

    Thanks,

    EH

  • 0  

    OK, I know you have a support case for this. I would recommend pasting the results there too and please generate debugs of 1 of the failing sessions and upload them to the case. I can monitor the case and provide some guidance to the engineer.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • 0 in reply to   

    I have uploaded a debug file to the case. 

  • 0

    The following seems to be similar to our issue.

    portal.microfocus.com/.../KM000037772

  • 0  

    Great. I saw you provided debugs earlier also, but from different sessions. Now that you generated new ones, please do me a favor and also pull the related bma debugs from the client system (/tmp). This will most likely need to go to our labs. I'll instruct the engineer assigned to your case to elevate it.

    Regarding the "reconnect issue, the behavior looks similar indeed, but not sure the root cause will be exactly the same. Good to know though and we'll keep it in mind. Thanks for your kind cooperation.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.