Issue with gateway selection when creating Deduplication Store

When trying to configure (add) a Deduplication Store device on a new server, I could create the store, but when having to select a "client" for the gateway, the machine with the store isn't listed; instead machines that still run StoreOnce are listed. I don't understand, but  I'm new to Deduplication Store. And I cannot continue creating the device unless I add at least one gateway, it seems. Is this a software bug, or a misunderstanding at my side?

This is for Data Protector 24.4, and the machine to host Deduplication Store has the Deduplication Store and the Disk Agent installed; does it need a Media Agent, too? I thought Deduplication Store is a kind of Media Agent.

Parents
  • Suggested Answer

    0  

    To extend a little more on this ... The "Deduplication Store" and the "Storeonce Software Deduplication" packages are basically the engines, the servers. In addition to that you need one or more gateways, either on the same or on a or multiple different hosts. That's the same for both software implementations (DPD and SOS) as well as for all supported hardware deduplication devices. And that gateway is included in a media agent.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • Suggested Answer

    0   in reply to   

    I assume the general concept of interaction between CS, DA and MA is known. The additional link in a B2D device scenario is between the gateway (MA) and the B2D device itself. This could be a fiber link (only with hardware devices) or a network link.

    When you are talking about a "local gateway" I'm assuming you mean a gateway residing on the DPD or SOS system. That will avoid the additional network traffic between the deduplication server and the gateway indeed. There's however another aspect to keep in mind and that's the CPU required for deduplication. By using a remote gateway, part of the required resources for deduplication will move to the remote gateway system. So it's not only about network bandwidth, but also about CPU power.

    In general we talk about low bandwidth and high bandwidth data transfers. A high bandwidth data transfer is established with a target-side gateway in Data Protector. In this case all data is transferred between gateway and device and the deduplication is fully happening on the device itself. A low bandwidth transfer is established using a server-side or source-side gateway. In this case the deduplication is mainly happening on the gateway system which basically means less network traffic to the device, but more resources needed on the gateway system. The difference between source-side and server-side is that the first one is implicitly defined (always runs on the DA system) while the second is explicitly defined (on a specific DP client).

    Let's go back to the scenario of a software deduplication server (SOS or DPD). Having a gateway on the deduplication host itself may not always be the best choice as the system, although well equipped, may still run out of resources easily. The most obvious choice may be to have the gateway on the DA host, but that will only work when that host has enough resources. So in some cases it may help to have it remote (server-side gateway) which offloads the load from the DA host (but means additional network traffic).

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

  • Verified Answer

    +1   in reply to   

    An important detail around deduplication is that the deduplication engine is not only available on the deduplication server, but also built in the API's provided by the dedupe device vendors and built into our media agent serving as a gateway. The low bandwidth data transfer I already mentioned is not our invention, but rather a general deduplication concept. We are providing 2 different gateway types for low bandwidth (source- & server-side), but the only difference is the way they are defined. If you use a server-side gateway on your application or DA host then you have exactly the same as with a source-side gateway.

    And yes, the way deduplication works is: it is buffering the data, chunking and hashing it, looking for a match on the device, compressing the chunks that are not available yet and sending them over to the store. In case of high bandwidth, the only thing done on the gateway is buffering the data and sending over the full blocks. All other task occur on the device itself. In case of low bandwidth, most of the task are happening on the gateway. The device is basically contacted for matching the hash list and for storing the unavailable chunks, already compressed on the gateway side. Note again this is all general deduplication knowledge, not specifically data protector.

    I believe most of your questions should be clear if the above is understood.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

Reply
  • Verified Answer

    +1   in reply to   

    An important detail around deduplication is that the deduplication engine is not only available on the deduplication server, but also built in the API's provided by the dedupe device vendors and built into our media agent serving as a gateway. The low bandwidth data transfer I already mentioned is not our invention, but rather a general deduplication concept. We are providing 2 different gateway types for low bandwidth (source- & server-side), but the only difference is the way they are defined. If you use a server-side gateway on your application or DA host then you have exactly the same as with a source-side gateway.

    And yes, the way deduplication works is: it is buffering the data, chunking and hashing it, looking for a match on the device, compressing the chunks that are not available yet and sending them over to the store. In case of high bandwidth, the only thing done on the gateway is buffering the data and sending over the full blocks. All other task occur on the device itself. In case of low bandwidth, most of the task are happening on the gateway. The device is basically contacted for matching the hash list and for storing the unavailable chunks, already compressed on the gateway side. Note again this is all general deduplication knowledge, not specifically data protector.

    I believe most of your questions should be clear if the above is understood.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.

Children
  • 0 in reply to   

    Yes, I probably should have a better background, but OTOH I'm old enough to remember that once (in the good old days of hand-written documentation) there was a "Concepts Guide" where (as the name suggests) the concepts were explained. I'm glad I did not delete it; it's still valuable:

  • 0   in reply to   

    Joy

    Cannot deny it was nice to have the concepts guide. On the other hand it's good to have an online documentation portal now which can be updated on-the-fly. And I can tell you it is being updated on a regular basis. And while there's still a lot of work to do, I see improvements all the time.

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button.