How to sort and count information on the bash command line using the bash commands sort and uniq.
In Part 1 of this series, we extracted the IP addresses of machines making unauthorized SSH connections from a
/var/log/secure file. The result of that exercise was a file containing a list of IP addresses with each IP representing a failed login attempt. If you don’t want to generate that list you can down load it here.
The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.
The goal of Part 2
In this part we will use the bash tools
uniq to sort, count and rank the IP addresses contained in
IPs.txt. We will use these commands together utilizing the bash
| to produce a new command that will take an unsorted input of IP addresses and rank them.
Step 1 – Sorting the IP addresses
The IP addresses contained in
IPs.txt are recorded in the order that they were received by the SSH server and are consequently mixed up. The first command we will use is
sort to fix this.
sort will order lines of text alphabetically and lines of numbers numerically. When the contents of
IP.txt is piped into
sort all identical IP’s will end up grouped together. This is important for the next step. We will use the
cat command to direct the contents of
IPs.txt into sort:
cat IPs.txt | sort
Step 2 – Counting the IP’s
We will use the
uniq command here to count the IP’s. The
uniq command will remove any repeated lines leaving only a single instance of the line in their place. The
sort command we used in Step 1 has grouped the IP’s together so when we run
uniq only a single instance of each IP will remain.
However, we also want to rank the IP’s by the number of time they tried to log in, to do that we need to count the number of times they occurred in
uniq takes the -c option which makes it print the number of times each line occurred at the beginning of the line containing the remaining single instance of the IP.
uniq -c to the command we created in Step 1 looks like:
cat IPs.txt | sort | uniq -c
This produces the output:
14 18.104.22.168 270 22.214.171.124 162 126.96.36.199 18 188.8.131.52 1 184.108.40.206 70 220.127.116.11
The number printed before the IP is the number of times that the IP occurred in
Step 3 – Ranking the IP’s
The final step is to sort the lines by the number of times IP occurred in the log file. This is done with the
sort command again but this time using the
-n option. The
-n option will force sort to sort on the number at the beginning and therefore rank the IP by the number of login attempts they made.
sort -n to our command gives us:
cat IPs.txt | sort | uniq -c | sort -n
Which produces the following output:
1 18.104.22.168 14 22.214.171.124 18 126.96.36.199 70 188.8.131.52 162 184.108.40.206 270 220.127.116.11
If you want the order reversed, i.e. from most to least add the -r option with
Now, we will direct this output to a file for use in Part 3:
cat IPs.txt | sort | uniq -c | sort -n >sorted-IPs.txt
You can now take a file or indeed any input and rank the contents of that input by the number of times that any line appears. The chained commands
sort | uniq -c | sort -n
is an incredibly useful command line tool that you will use frequently as a systems administrator.
In Part 3, we will create a one line
for loop to iterate over the ranked IP’s and run a
whois lookup on each one to find out which country they originated from.