Copying Data: Are We There Yet?
I am sure this will sound familiar: you are copying a large amount of data – either locally or over the network – and you are wondering how long it will take and if there is a way to make things go faster.You may be surprised, but it does matter what type of files you are copying: 1Gb-worth of many small files will take considerably longer to copy than two 500Mb files. The hardware you are using is an important consideration, but it’s not the only factor limiting data transfer speed.
Here’s one scenario: you are copying 100Gb of data from one partition to another partition of the same disk. The disk is 7200k RPM 3 Gbit/s SATA-II in an external USB 2.0 enclosure. Theoretically, this disk supports up to 300 MB/s data transfer speed. However, since you are reading and writing on the same disk, the speed of data transfer will be only 25% of what the disk supports, or 75 Mb/s. The USB 2.0 interface supports up to 480 Mbit/s rate of transfer, this is about 60 MB/s in theory. This speed will be cut in four, since you are reading from and writing to the same disk. In this example, the absolute best data copy speed you can expect to see is about 15 Mb/s, but you’d be lucky to get half of this.
So, let’s take a look at the actual disk performance for this example:
iostat -xk 1 avg-cpu: %user %nice %system %iowait %steal %idle 16.00 0.00 27.00 57.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 8.00 98.00 84.00 6145.00 8092.00 156.45 23.83 86.20 5.34 97.20 sdc1 0.00 0.00 98.00 0.00 6145.00 0.00 125.41 3.02 27.76 9.47 92.80 sdc2 0.00 8.00 0.00 84.00 0.00 8092.00 192.67 20.82 154.38 9.19 77.20
The data is being copied from /dev/sdc1 partition to /dev/sdc2 partition. The average read speed is about 6 Mb/s. Thus, it may take up to 5 hours for the copy process to complete. Is there a way to speed things up? One option would be to take the disk out of the USB enclosure and connect it internally to the SATA-II interface. This alone will cut the copy time down to about 30 minutes and it will justify the time you would spend on moving the disk. To speed things up even further, copy data from your SATA-II drive to another internal drive (preferably on a different controller) and then copy it back to a different partition on the original disk. This will cut the copy time down to about 15 minutes.
Copying data over the network normally doesn’t stress the hard drives, unless you have an HPC cluster with InfiniBand network or something of that nature. Running the “iostat” command will show you I/O on the disk, but this is not the best way of estimating transfer rates when moving large amounts of data over the network. A simple tool for looking at real-time network upload/download speed is “bmon“. This small but useful application runs in your terminal window and displays detailed network stats for each NIC, as well as a cool ASCII graph.
However, with bmon there is no way to differentiate between network traffic created by your copy process and all the other network traffic on the system. When moving data over the network, you would normally use FTP for best performance, but you may also use NFS, Samba, or even HTTP. There are many tools that allow you to test network performance. Once of the most common tools is Bonnie++. While not a network testing application, Bonnie++ performs a series of read/write tests on a filesystem of your choice. If that filesystem happens to be NFS- or Samba-mounted, then the test results will show you NFS or Samba performance (unless you have an extremely high-performance network that exceeds the performance of your storage system).
In the following example we run bonnie++ – a popular filesystem testing utility for Linux and Unix – on a system with 512Mb of RAM:
deathstar:~ # bonnie++ -n 0 -u 0 -r 512 -s 1024 -f -b -d /backups Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Version 1.01d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP deathstar 1G 16917 5 9500 2 24482 3 115.7 0 deathstar,1G,,,16917,5,9500,2,,,24482,3,115.7,0,,,,,,,,,,,,,
Explanation of the parameters:
“-u 0” Run under root UID
“-r 512” This system has 512Mb RAM
“-s 1024” File size for the test, should be twice the amount of RAM
“-d /backups” Name of the filesystem to test.
It is important to understand that Bonnie++ does not test the hard drive or the network. It tests filesystem performance. If, for example, you run a test on a local disk and see performance lower than expected, it does not mean your disk is going bad. It may be just that your CPU is overloaded, you are running out of RAM, or there may be an OS issue. Therefore, if you are using Bonnie++ to compare performance of different hard drives, you need to make sure that all other system parameters during your testing remain unchanged.
You can read more about Boniee++ and see additional usage examples here.
Let’s say you are copying a large amount of data from /disk1 filesystem to /disk2. You started your copy process – cp, rsync, tar, whatever you decided to use – and now you need to know how long the copy process will take. Below is a simple Korn shell script that will look at the source directory and the target directory and will try to estimate the remaining time. The usage for this example would be: copy_porgress.ksh /disk1 /disk2
#!/bin/ksh # ------------------ # CONFIGURATION # ------------------ if [ ! "" -a ! "" ] then echo "Usage: copy_progress.ksh /source /target" exit 1 else if [ "$1" == "$2" ] then echo "Error: Source and target directories must be different" exit 1 else source="$1" target="$2" fi fi # ------------------ # FUNCTIONS # ------------------ analyze_source() { echo "Calculating size of source" source_size=$(du -sk "$source" | awk '{print $1}') } analyze_target() { echo "Calculating size of target" target_size=$(du -sk "$target" | awk '{print $1}') } analyze_transfer() { echo "Analyzing transfer parameters" analyze_target start_size=$target_size start_time=$SECONDS echo "Sleeping 1 minute" sleep 60 analyze_target end_size=$target_size end_time=$SECONDS size_delta=$(echo "scale=0;$end_size - $start_size" | bc -l) time_delta=$(echo "scale=0;$end_time - $start_time" | bc -l) transfer_rate_kbps=$(echo "scale=2;$size_delta / $time_delta" | bc -l) transfer_rate_mbps=$(echo "scale=2;$transfer_rate_kbps / 1024" | bc -l) size_remaining=$(echo "scale=0;$source_size - $target_size" | bc -l) time_remaining_sec=$(echo "scale=0;$size_remaining / $transfer_rate_kbps" | bc -l) time_remaining_min=$(echo "scale=2;$time_remaining_sec / 60" | bc -l) time_remaining_hr=$(echo "scale=2;$time_remaining_min / 60" | bc -l) } show_results() { cat << EOF Current transfer rate: $transfer_rate_mbps Mb/s Time remaining: $time_remaining_min min EOF } # ------------------ # RUNTIME # ------------------ analyze_source analyze_transfer show_results
And sample output of the script:
icebox:/var/adm/bin # ./copy_progress.ksh /disk1 /disk2 Calculating size of source Analyzing transfer parameters Calculating size of target Calculating size of target Current transfer rate: 22.89 Mb/s Time remaining: 1.00 min
In the script you can modify the “sleep 60” wait time. If the script “sleeps” for five minutes instead of one, the result will be more accurate. Keep in mind that this script will not work if you are moving files instead of copying them.
3 Comments »
1 Pingbacks »
-
[…] filesystem to test network throughput. You can read about other network performance testing options here. Bookmark on Delicious Digg this post Recommend on Facebook share via Reddit Share with Stumblers […]
I want to save data stored on computer and yet still leave on computer to work with. Will this flash drive do the job? I was going to give its size, but temporarily it is misplaced. I do know it is large enough. Thanks in advance
I just got a Galaxy S II, and I was wondering how I can put music on it. I don’t have a Micro SD card yet, and I heard that you can put music on their via Kies Air, but that is taking a very long time. Are there any other, simpler ways?
I have been given a piece of work and it talks about is the brandt line still relevent. Therefore I need to know what disparites Willy Brandt used when he drew the line so I can compare between data from then and now.
Thanks guys,
Note: I have looked everywhere for stuff on the indicators but I am yet to find anything. I have even looked into if the local libary has a copy of the report, but I am yet to find a libary that does.