Copying lots of small files from one Linux host to another can take a long time. Much longer than the data alone should take to copy.
I’ve often wondered, when I’ve had to copy lots of files what is the fastest way to do this.
Let’s find out!
To start theist tests I created 100K 64b files of random data all contained in a directory called
This directory was 825M.
The two Linux systems were two Debian 10 DigitalOcean droplets connected over a VLAN.
The two commands were:
rsync -a 100k email@example.com:~/ scp -r 100k firstname.lastname@example.org:~/
The resulted were:
rsync was around 11 times faster. Dump
scp if you are still using it out of habit.
Can we do better?
What if we make an archive out of the tar archive out of the files?
tar -cf 100kfiles.tar 100k
The resultant archive
100kfiles.tar was 152M. This is because the smallest filesysetm block size is 4 KiB. So every file, even though it’s just 64b of data takes 4KiB of space. The archive only includes the data the files contain.
Copying these produced the following results:
If you’ve got the space to create a local archive then do this before making the copy.
What if you don’t have the local space?
No problem. Use
tar to dump the archive directly into
SSH and un-tar it at the other end. This gives you the benefits of using an archive without needing any local space.
This is the command I used:
tar cf - 100k | ssh email@example.com:/destination/ "tar -x"
This method gives us the winner at 0m8.542s.
- in the first
tar command instructs
tar to send the output to stdout instead of an archive.