“My backups/restores are too slow” … have your heard that statement before? Or are you involved in performance tuning activity at the moment? Well, this Blog may help you finding the right spots to look at.
First of all: Be open and look at all involved partners! Many administrators tend to look at backup software first since it was the one highlighting the issue. This may not be the right approach. However, Data Protector has several capabilities helping you find possible bottlenecks. Depending on that you must apply appropriate tuning.
There are 4 major partners to look at:
- The backup client (disk storage, CPU/RAM …)
- The infrastructure (LAN, SAN …)
- The backup target (device types, features …)
- The backup software (parallelism, block-size, optimization …)
In the end there are many more items but let’s start with the usual suspects. The direction of investigation is source -> infra -> target -> software.
The backup clients
You gave it lots of CPU, RAM and Disk. How can this be slow in a backup or restore? This is mostly about read performance in backup and write performance for restore. Testing the theoretical maximum of your read performance could be done with disk tools. This would give you the performance a backup cannot ever exceed. If you don’t like that result you can stop here and tune the storage system. Give it more or faster spindles per RAID set, or even consider solid state storage. That’s as equal important for restore but now you have to look at write performance. A change of RAID level might be an answer (RAID 5/6 writes slower than RAID 1/0). Now, the issue with disk tools is that they read/write from/to memory and that’s not the complete backup/restore flow. That’s why backup/restore performance is well below the theoretical maximum. For backup it has to read from disk into memory, write to a network interface again. Means data is travelling your server I/O twice.
One way of simulating true backup/restore I/O is creating a local “Null” device. That’s usually “/dev/null” in Linux/UNIX or “NUL” on Windows. With this device being set up you can create a backup job reading local data and write is to that local Null Device. The result is a backup performance which includes all the backup overhead, therefore being closer to real life. This number will be lower than what have seen with disk tools. Now, let’s say this number is good enough and it even leaves some room for data growth. In this case you have identified your client not being the bottleneck.
A ”Null” device can provide features like concurrency (# of streams per device) and blocksize. Use it to find the sweet spot for your setup. You can also create multiple “Null” devices and test the minimum/maximum number of devices needed. In many occasions 4 drives seems to be giving the best results. Note that going too high can make performance degrade again.
Find the maximum for your configuration and stop there. The principle is not “the more, the better”. You’ll find documentation about creating a Null device here.
The infrastructure
Your LAN connection may be dedicated to backup/restore or it may be the link used by everybody else. In the latter case your backup/restore performance will vary with the available bandwidth of your LAN. Note: Backup/Restore is the most I/O intensive operation in IT!
Means, if you need predictable backup/restore performance, also supporting your SLA’s, you need a dedicated backup LAN. Or even better: make use of your SAN. You may already have that available for disk storage, so why not hooking up your backup device into the SAN and use it’s phantastic bandwidth?
OK, let’s find our if your infrastructure is part of your performance issue. We are going to make use of a Null Device again but this time it is placed on your Media Server (well, the guy with a DP Media Agent installed). Create a backup job reading from the client and storing on the remote Null Device. This way you have backup overhead plus network bandwidth in your result. If you client-only backup was fine but including the LAN now shows a dramatic drop in performance, you have identified your LAN being an issue. Means, you may have to switch from 1Gb to 10Gb or you may have to use LAN adapter teaming increasing LAN bandwidth (also allows for some redundancy at the same time).
The targets
From a testing point of view this would do with the backup jobs you already have since they already include source, infra and target. In most cases you will see numbers declining from source -> infra -> target. That’s expected. But still you would see results you can start your tuning operations. Let’s say client and infra tests were OK, well above the minimum you need. But including the target suddenly make your performance drop below your needed minimum you can assume your target is your bottleneck.
How can this be since the datasheet told me the device is well above whatever I need? One reason could be the device type. For instance, if you need extremely good single stream performance (your data source may only deliver one stream at a time) you should consider using tape (yes, tape, the good old device everybody thought is dead since decades …). A deduplication Appliance may not give you good single stream performance. Also, if your backup device is disk-based, the good old rules of: use more spindles, use faster spindles work pretty well here. Upgrade your spindle performance even the capacity is not your concern.
Summary up to here
Now you have the numbers for client, infrastructure and target. If any of those numbers is way out of whack you should apply tuning to your bottleneck(s) and test again, and again. You now have the numbers to compare since performance will decline over time. Why? Because data gets fragmented making it slower to read. Your environment will also grow and that’s going to extend backup/restore runtime.
Client standalone perf. |
Client + Infra perf |
Client + Infra + Target |
|
|
|
|
|
|
|
|
|
|
|
|
The backup software (Data Protector of course)
First of all, DP supports you with making use of certain device or client settings. One of which is device block size. The default is 256KB which has been chose to be compatible with most devices and their capabilities around. Not all devices support greater than 256KB. However, for the devices supporting that we recommend changing device blocksize to 1024KB. For instance, that’s true for most dedupe appliances and DPD. Make sure you use the same blocksize for device to be used for object copy since blocksize needs to be the same.
Check out capabilities like drive concurrency, which is the number of streams you can send to a given device at one time. Usual settings are 3 or 5 concurrent streams but you can change that of course. For instance, if you have 4 devices and each on them is set up with concurrency of 3 then you can have 12 streams running in parallel. If your media server can take more load you would increase concurrency. If your media servers are saturated you want to set up more devices (on different servers).
If your backup client (mostly the application on it) supports multi-streaming make sure DP has picked that up and is using it. For some application integrations you can set that in the DP Options tab or area. Check out application vendors for support of multiple streams, prerequisites and limitations. For file systems you can set up multiple Disk Agents manually.
DP supports many ways of multi-pathing and multi-streaming. When running a backup job please make sure to verify that those options were used, otherwise check the backup settings again.
Some backup devices can be reached by LAN and SAN at the same time. Make sure your device configuration is using the desired path. This may not be obvious when scrolling down the backup monitor output very fast.
Comparing results
Make sure you measure the same way along-side all those tests mentioned above. There is a distinction between backup runtime and backup performance: Backup runtime describes the total time for a backup job, including all preparation and clean-up steps. This is not what we need to performance comparison. You need to look at individual stream performance being printed in the DP Monitor window.
There is another distinction between DP performance and 3rd party performance. For example, if a VM backup is done we usually need a snapshot of that VM first making the data consistent during backup runtime. The time it takes creating the snapshot is not a DP performance but actually done by the hypervisor system. If that is too slow, so will be your backup performance (and runtime). If this happens, check with your hypervisor vendor.
Summary
Finding bottlenecks and applying the right level of tuning is not easy. This guide should help you making it a structured approach that is comparable and reproducible. Obviously, there are more items to look at and they might be picked up in another update for this Blog.
We hope this Blog also shows the necessity for planning your backup/restore performance when laying out a new IT infrastructure. Otherwise, you’ll get what you paid for. SLA’s like RPO and RTO are a functions of infrastructure investment, nothing you can just turn on or off.
Do the above procedures apply to cloud backup/restore? Yes of course. The challenge with cloud backup is that you may not have access rights for fine-tuning clients, infrastructure and targets. You may not even have set RPO/RTO for cloud backups since you assume this is a provider responsibility now. Well, let me kindly ask you to think about that again …
OpenText
Uli Wallscheid
Data Protection Evangelist
Explore how OpenText Data Protector can help your organization to ensure data integrity and data protection.
Request a free trial of OpenText Data Protector
Learn more about Cloud data backup and restore
Already a Data Protector customer? Learn what is new in the latest version.
Read about data backup and resiliency
Read what analysts say about Data Protector
Read what a customer is saying about Data Protector