We frequently have to run many processes from a script. The default behavior is that the processes or jobs are run serially. That is to say that each new process will start when the one before finishes.
On modern multi-core systems, this is wasting a lot of CPU resources and time.
xargs is a complex tool that comes with the ability to run processes in parallel.
The general structure of parallel processing with xargs
is as follows:
xargs -P0 <COMMAND>
-P
= Max-processes to start simultaneously. The number e.g. 0, sets the number to run at the same time. 0 will start as many as possible at one time.
The one crucial thing you must remember is that:
YOU HAVE TO PASS XARGS ALL THE OBJECTS
You can’t use this in the middle of a for
loop as this will only pass xargs
a single object. You have to generate all the objects e.g. files, and then pass them all to xargs
.
Using find
In this example, we will simply rename 10 text files.
$ ls
10.txt 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt
First, build the list of files we need to modify using find
:
$ find . -type f -name "*.txt"
Next, pipe that into xargs
$ find . -type f -name "*.txt" | xargs -i -P0 mv "{}" "{}.bac"
$ ls
10.txt.bac 2.txt.bac 4.txt.bac 6.txt.bac 8.txt.bac
1.txt.bac 3.txt.bac 5.txt.bac 7.txt.bac 9.txt.bac
Here the -i
allows us to use {}
as the object that you have passed to xargs
.
Use a file that stores the file names
Instead of find
you can store the objects, in this case, file names, in a text file and pass those to xargs
.
First, generate the list of file names (with a full path):
find . -type f -name "*.txt" -exec realpath {} \; >file-paths.txt
realpath will print the full path to a file.
Next, simply cat the file and pipe it into xargs
:
$ cat "file-paths.txt" | xargs -i -P0 mv "{}" "{}.bac"