How To Use Xargs To Execute Processes In Parallel

We frequently have to run many processes from a script. The default behavior is that the processes or jobs are run serially. That is to say that each new process will start when the one before finishes.

On modern multi-core systems, this is wasting a lot of CPU resources and time.

xargs is a complex tool that comes with the ability to run processes in parallel.

The general structure of parallel processing with xargs is as follows:

xargs -P0 <COMMAND>

-P = Max-processes to start simultaneously. The number e.g. 0, sets the number to run at the same time. 0 will start as many as possible at one time.

The one crucial thing you must remember is that:

YOU HAVE TO PASS XARGS ALL THE OBJECTS

You can’t use this in the middle of a for loop as this will only pass xargs a single object. You have to generate all the objects e.g. files, and then pass them all to xargs.

Using find

In this example, we will simply rename 10 text files.

ls
10.txt  1.txt  2.txt  3.txt  4.txt  5.txt  6.txt  7.txt  8.txt  9.txt

First, build the list of files we need to modify using find:

find . -type f -name "*.txt"

Next, pipe that into xargs

find . -type f -name "*.txt" | xargs -i -P0 mv "{}" "{}.bac"
ls
10.txt.bac  2.txt.bac  4.txt.bac  6.txt.bac  8.txt.bac
1.txt.bac   3.txt.bac  5.txt.bac  7.txt.bac  9.txt.bac

Here the -i allows us to use {} as the object that you have passed to xargs.

Use a file that stores the file names

Instead of find you can store the objects, in this case, file names, in a text file and pass those to xargs.

First, generate the list of file names (with a full path):

find . -type f -name "*.txt" -exec realpath {} \; >file-paths.txt

realpath will print the full path to a file.

Next, simply cat the file and pipe it into xargs:

cat "file-paths.txt" | xargs -i -P0 mv "{}" "{}.bac"