We frequently have to run many processes from a script. The default behavior is that the processes or jobs are run serially. That is to say that each new process will start when the one before finishes.
On modern multi-core systems, this is wasting a lot of CPU resources and time.
xargs is a complex tool that comes with the ability to run processes in parallel.
The general structure of parallel processing with xargs
is as follows:
xargs -P0 <COMMAND>
-P
= Max-processes to start simultaneously. The number e.g. 0, sets the number to run at the same time. 0 will start as many as possible at one time.
The one crucial thing you must remember is that:
YOU HAVE TO PASS XARGS ALL THE OBJECTS
You can’t use this in the middle of a for
loop as this will only pass xargs
a single object. You have to generate all the objects e.g. files, and then pass them all to xargs
.
Using find
In this example, we will simply rename 10 text files.
ls
10.txt 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt
First, build the list of files we need to modify using find
:
find . -type f -name "*.txt"
Next, pipe that into xargs
find . -type f -name "*.txt" | xargs -i -P0 mv "{}" "{}.bac"
ls
10.txt.bac 2.txt.bac 4.txt.bac 6.txt.bac 8.txt.bac
1.txt.bac 3.txt.bac 5.txt.bac 7.txt.bac 9.txt.bac
Here the -i
allows us to use {}
as the object that you have passed to xargs
.
Use a file that stores the file names
Instead of find
you can store the objects, in this case, file names, in a text file and pass those to xargs
.
First, generate the list of file names (with a full path):
find . -type f -name "*.txt" -exec realpath {} \; >file-paths.txt
realpath will print the full path to a file.
Next, simply cat the file and pipe it into xargs
:
cat "file-paths.txt" | xargs -i -P0 mv "{}" "{}.bac"