Tips for uploading large docs

Our department uploads zip files/documents containing as large as 23 GB or larger.

We are currently using CM9.4 and moving to CM23.4.

Any advise on how to help end users upload large documents?

  • 0  

    Hi Anna,

    Documents of that size do transfer, but can take quite a while depending on where they are uploaded from, and some of your internal CM settings. Some CM versions look like they 'lock up' while this transfer takes place so if the user closes their CM while this happens it can stop the transfer.

    For documents THAT size, would be best to use a desktop client that is closest to the server and not 'at home' on a VPN connection or via the web client and potentially do it during reduced business hours (post 5PM).

    Caching Options:
    During transfer, the items that are uploaded are typically cached along the way. There is 'user estore cache', and workgroup server cache, before the item gets to the store.

    This can mean an uploaded item is 'stored' at several spots along the way to aid in faster retrieval later (especially if you have 'local' and 'regional' workgroups). This can have a negative effect when sizes get that big, as a single 23GB file, can be stored several times along the way. 

    For workgroup caching options, you can either disable this at a workgroup server level (possibly only a single server that is used for these large file uploads), or just make sure there is the extra space, however, workgroup caching is normally only done up to 'x' GB in size so it may cause other items to fall out of the cache prematurely. 

    I'd like to think that where caching has been set to say 10GB, that an item over that size would skip the cache altogether, but may be worth checking just in case.

    Async transfer:
    Document transfers can happen as part of a concurrent process, or an asynchronous process.

    On the workgroup server in Enterprise Studio, under document transfer options, there is a 'synchronous document store transfers' option.

    'Synchronous Document Store transfers - select for this Workgroup Server to transfer documents to the document store directly, bypassing the folder C:\Micro Focus Content Manager\ServerLocalData\TRIM\Pending. This makes all the other options in this tab unavailable. When this option is not selected, the file transfer to the document store is asynchronous.'

    This option may potentially speed up the transfer, as it doesn't have to transfer to the pending folder on the server first, before going to the store.

    Under your dataset system options there is also a 'Transfer documents asynchronously to the workgroup server' option.

    Transfer documents asynchronously to the Content Manager Workgroup Server - select to have Content Manager transfer documents to the Workgroup Server when there is time, not necessarily when a user checks them in.

    For large documents, I'd have this second option off, otherwise it can take quite a while for the 'when there is time' to be available for the transfer to begin, and potentially be interrupted.

    Hashing Features:

    Due to the large size of the files, and depending on the stability of your network, enabling document transfer hashing validation can ensure a successful transfer for extremely large files (which may be more prone to not transferring correctly). This option will slow down the transfer though, as at each transfer (client to server, server to store), the file will be double checked using a hash to make sure the transfer is 100% accurate. Doesn't really impact small files as much, but does impact large transfers which can already be slow and if this option is also enabled for the extra safety.

    Record Type Settings:

    For files that large, it may be worth while that once uploaded and verified, to prevent either the viewing (which would kick off a 23gb transfer of the file to the end users machine), or editing (which may create another 23gb revision). Some organisations do this by having a 'LARGE DOCUMENT' record type where the 'allow replace' and 'allow revisions' are off. This can also be done with access controls on a normal record type. Might depend on how often these large items are uploaded.

    Document Store Tier:

    As these large items are eventually located in the document store for safe-keeping, the 'tier' of the store may need to be considered. Having a 23GB file sitting on expensive document storage can increase the cost when applied over lots of large files. Using a cheaper document store tier (HDD instead of SDD) may be an option for files that are mostly as pure archival and not for routine access (as i cant imagine a 23gb file needing to be accessed all that much).

    There isn't unfortunately a 'one size fits all' when it comes to large file transfers. 23GB is going to take a while to save, so the main thing is, regardless of options, be patient with the transfer and make sure the end user PC, the server, and the store has the available space for the transfer along the way Slight smile

    Also curious in other lessons learnt from others within the community.

    -Scotty

  • Suggested Answer

    0   in reply to   

    May be also worth adding that most organisations tend to not allow ZIPs in their system as ZIPs can be tricky to dispose, as you'd need to sentence the ZIP using the most severe schedule of the contained items.

    (Too many times I've seen a whole 'team drive' ZIPped up, and saved into CM thinking the record obligations have been met)

    Also, if the ZIP becomes unreadable at any point, all items in the ZIP are no longer able to be accessed.

    If it is being done purely to 'save space' as ZIP's compress items. Compression can be enabled on the document store level.

    For a ZIP that size, I'd question its value as a single record and possibly suggest it gets broken up into smaller components that are easier to save, and also easier to schedule from a record keeping perspective.

    -Scotty

  • 0 in reply to   

    Hi Scotty

    I agree that zip files are tricky to dispose of, but I can also see from the user perspective when saving multiple photo's in one location etc.  

    I will review your solutions and see if we can implement some of your suggestions.

  • 0   in reply to 

    Can certainly understand from the users perspective as well (as I always like to 'reduce the clicks' so to speak)

    If they have quite a 'chunk' of photos that are going into the one container, a Document Queue is an alternate way to go.

    That way they can point the 'network folder' containing the photos, to the 'container' and they should all be ingested. Make sure to have the 'new record' form supressed if you can.

    Alternatively, there are scripts that can do this as well but this may have a bit more of a technical hurdle to overcome in some instances.

    -Scotty