How To Use Xargs To Execute Processes In Parallel

We frequently have to run many processes from a script. The default behavior is that the processes or jobs are run serially. That is to say that each new process will start when the one before finishes.

On modern multi-core systems, this is wasting a lot of CPU resources and time.

xargs is a complex tool that comes with the ability to run processes in parallel.

The general structure of parallel processing with xargs is as follows:

xargs -P0 <COMMAND>

-P = Max-processes to start simultaneously. The number e.g. 0, sets the number to run at the same time. 0 will start as many as possible at one time.

The one crucial thing you must remember is that:

YOU HAVE TO PASS XARGS ALL THE OBJECTS

You can’t use this in the middle of a for loop as this will only pass xargs a single object. You have to generate all the objects e.g. files, and then pass them all to xargs.

Using find

In this example, we will simply rename 10 text files.

$ ls
10.txt  1.txt  2.txt  3.txt  4.txt  5.txt  6.txt  7.txt  8.txt  9.txt

First, build the list of files we need to modify using find:

$ find . -type f -name "*.txt"

Next, pipe that into xargs

$ find . -type f -name "*.txt" | xargs -i -P0 mv "{}" "{}.bac"
$ ls
10.txt.bac  2.txt.bac  4.txt.bac  6.txt.bac  8.txt.bac
1.txt.bac   3.txt.bac  5.txt.bac  7.txt.bac  9.txt.bac

Here the -i allows us to use {} as the object that you have passed to xargs.

Use a file that stores the file names

Instead of find you can store the objects, in this case, file names, in a text file and pass those to xargs.

First, generate the list of file names (with a full path):

find . -type f -name "*.txt" -exec realpath {} \; >file-paths.txt

realpath will print the full path to a file.

Next, simply cat the file and pipe it into xargs:

$ cat "file-paths.txt" | xargs -i -P0 mv "{}" "{}.bac"