I’ve written about GNU Parallel in a previous post because it’s a really amazing tool. Well, it’s even more amazing than I thought.
Parallel has the built-in ability to send jobs to remote servers, use all of their cores to work on something, and return the results to the current, local directory.
The easiest way to get a handle on how this works is to walk through an example. I’m going to use two remote servers to compress some files. The remote servers are:
- single-core.server.com - 1 core
- multi-core.server.com - 4 cores
First, I’ll create some a 1MB file full of zeros with dd
:
dd if=/dev/zero of=bigfile bs=1MB count=1
Then make 50 copies:
for i in {1..50}; do cp bigfile{,$i}; done
Now, we’ve got something to work with let’s look at parallel. Here is the entire command:
ls bigfile* | parallel -S 1/root@single-core.server.com -S 4/root@multi-core.server.com --eta --trc {}.gz 'gzip -9 {}'
This breaks down as follows:
- ls bigfile* - Generates the file list for parallel.
- -S 1/root@single-core.server.com - Remote SSH server to use. The
1/
at the beginning tells parallel to run a single job. The 4 core server has4/
jobs run. - –eta - Print status information as the jobs are run.
- –trc - t ransfer the files to the remote server, r eturn the completed file and c leanup
the server. The files to return are referenced as
{}.gz
because the original files,{}
have had.gz
appended bygzip
when it compressed them. - ‘gzip -9 {}’ - The command to run on the remote servers.
The --eta
options prints the following status information whilst the jobs are running:
Computers / CPU cores / Max jobs to run
1:root@single-core.server.com / 1 / 1
2:root@multi-core.server.com / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 21s Left: 43 AVG: 0.50s root@single-core.server.com:1/0/7%/0.0s root@multi-core.server.com:4/8/92%/1.0s
If you need more information about using parallel take a look at the tutorial with examples.