The Best Way To Copy Lots Of Small Files

Copying lots of small files from one Linux host to another can take a long time. Much longer than the data alone should take to copy.

I’ve often wondered, when I’ve had to copy lots of files what is the fastest way to do this.

Let’s find out!

The Setup

To start theist tests I created 100K 64b files of random data all contained in a directory called 100k. This directory was 825M.

The two Linux systems were two Debian 10 DigitalOcean droplets connected over a VLAN.

The Results

First, I simply copied the directory using rsync and scp.

The two commands were:

rsync -a 100k joe@10.1.0.1:~/
scp -r 100k joe@10.1.0.1:~/

The resulted were:

Command Time
rsync 0m21.064s
scp 3m56.392s

rsync was around 11 times faster. Dump scp if you are still using it out of habit.

Can we do better?

What if we make an archive out of the tar archive out of the files?

tar -cf 100kfiles.tar 100k

The resultant archive 100kfiles.tar was 152M. This is because the smallest filesysetm block size is 4 KiB. So every file, even though it’s just 64b of data takes 4KiB of space. The archive only includes the data the files contain.

Copying these produced the following results:

Command Time
rsync 0m1.182s
scp 0m1.271s

If you’ve got the space to create a local archive then do this before making the copy.

What if you don’t have the local space?

No problem. Use tar to dump the archive directly into SSH and un-tar it at the other end. This gives you the benefits of using an archive without needing any local space.

This is the command I used:

tar cf - 100k | ssh joe@10.1.0.1:/destination/ "tar -x" 

This method gives us the winner at 0m8.542s.

Note, the - in the first tar command instructs tar to send the output to stdout instead of an archive.