How To Use GNU Parallel To Create A Supercomputer

I’ve written about GNU Parallel in a previous post because it’s a really amazing tool. Well, it’s even more amazing than I thought.

Parallel has the built-in ability to send jobs to remote servers, use all of their cores to work on something, and return the results to the current, local directory.

The easiest way to get a handle on how this works is to walk through an example. I’m going to use two remote servers to compress some files. The remote servers are:

First, I’ll create some a 1MB file full of zeros with dd:

dd if=/dev/zero of=bigfile bs=1MB count=1

Then make 50 copies:

for i in {1..50}; do cp bigfile{,$i}; done

Now, we’ve got something to work with let’s look at parallel. Here is the entire command:

ls bigfile* | parallel -S 1/root@single-core.server.com -S 4/root@multi-core.server.com --eta --trc {}.gz 'gzip -9 {}'

This breaks down as follows:

The --eta options prints the following status information whilst the jobs are running:

Computers / CPU cores / Max jobs to run
1:root@single-core.server.com / 1 / 1
2:root@multi-core.server.com / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 21s Left: 43 AVG: 0.50s  root@single-core.server.com:1/0/7%/0.0s  root@multi-core.server.com:4/8/92%/1.0s

If you need more information about using parallel take a look at the tutorial with examples.