Idea ID: 1638649

Backup a single file system with multiple streams

Status: Accepted

Brief description:

While file systems continue to grow and the underlaying disk subsystems become faster the backups of large file systems are still one of the major concerns for traditional backup environments. The Data Protector Disk Agent should support multiple streams on a single file system that can be configured similar to device concurrency to drastically speed up the backup and the restore operation.

Picture1.png

Existing Enhancement Requests:

QCCR2A62413: Support multiple disk agents per volume

Benefit:

Reduce the complexitiy of subdividing file systems manually into smaller pieces. Optimize for backup and restore performance using multiple streams even on large file systems with millions of files and folders.

Tags:

  •  
    we are actually seeing performance increases of 2-3 times of a standard DA backup. That's in the ballpark of multi-streamed approaches. The goal is to make backup and restore go faster and that's not necessarily depending on the number of streams.

    However we are looking at a multi-stream option for BBB.

  • will Block-Based-Backup do more than a single, sequential read?  If not, you're really not addressing the problem.  Yes, skipping the OS filesystem layer will yield a performance improvement, but not of the magnitude needed.  A block-based, single-stream, sequential read of an object measured in dozens of TB is still untenable.  Whether block-based or filesystem-based, the agent needs intelligence added to analyze the object to be backed up and dynamically execute a divide-and-conquer strategy.  I've been preaching this for more than a decade.  Thanks.

  •  

    This will be addressed within Block-Based-Backup

  • Hi Shishir, Hi antaln,

    Thanks for your questions. I had to think about this for a few days.

     is right, having a multi-thread only implementation for backing up data has benefits. It would allow us to maintain the single object for each drive or mount point that can be directly presented in the GUI for restore and taken into consideration for Object Copy and Consolidation. The biggest disadvantage is that it does not cover the restore part where a multi-thread operation does not offer any advantage. Usually the backup media is sequential.

    Only a multi-stream solution offers benefits on backup and restore since Data Protector can restore from multiplexed media or multiple drives at the same time. Having multiple sub-objects would also allow Object Copy and Consolidation to scale with the number objects. In any case this must be transparent to the user. The individual parts of the multi-stream backup object must mapped into a single logical object (e.g. use the same description of each sub-object).

    Regards,
    Sebastian Koehler

  •  I would think that the multi-thread read would be effectively transparent to the existence of a single object.  That is to say, if a VBDA for WINFS object "F:/Users" or a VBDA for FILESYSTEM object "/var" decides to multi-thread for read efficiency during backup,  the objects "F:/Users" and "/var" should remain monolithic objects in the IDB and for the purposes of object copy, restore, and verification.  Otherwise, you'll have what you see now with manual divide and conquer where a restore context presents multiple "F:/Users" objects or multiple "/var" objects for restore, and you have to pray that you used good object descriptions in the backup spec to give yourself a clue which rabbit hole to jump down for restoring a specific folder or file.