#### Extracting Information From Logs - Part 2

##### How to sort and count information on the bash command line using the bash commands sort and uniq.

In Part 1 of this series, we extracted the IP addresses of machines making unauthorized SSH connections from a /var/log/secure file. The result of that exercise was a file containing a list of IP addresses with each IP representing a failed login attempt. If you don’t want to generate that list you can down load it here.

## The problem

The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.

## The goal of Part 2

In this part we will use the bash tools sort and uniq to sort, count and rank the IP addresses contained in IPs.txt. We will use these commands together utilizing the bash | to produce a new command that will take an unsorted input of IP addresses and rank them.

## Step 1 – Sorting the IP addresses

The IP addresses contained in IPs.txt are recorded in the order that they were received by the SSH server and are consequently mixed up. The first command we will use is sort to fix this. sort will order lines of text alphabetically and lines of numbers numerically. When the contents of IP.txt is piped into sort all identical IP’s will end up grouped together. This is important for the next step. We will use the cat command to direct the contents of IPs.txt into sort:

cat IPs.txt | sort


## Step 2 – Counting the IP’s

We will use the uniq command here to count the IP’s. The uniq command will remove any repeated lines leaving only a single instance of the line in their place. The sort command we used in Step 1 has grouped the IP’s together so when we run uniq only a single instance of each IP will remain.

However, we also want to rank the IP’s by the number of time they tried to log in, to do that we need to count the number of times they occurred in IPs.txt.

Fortunately, uniq takes the -c option which makes it print the number of times each line occurred at the beginning of the line containing the remaining single instance of the IP.

Adding uniq -c to the command we created in Step 1 looks like:

cat IPs.txt | sort | uniq -c


This produces the output:

    14 116.31.116.48
270 193.201.224.218
162 218.65.30.123
18 218.65.30.53
1 78.194.172.51
70 91.197.232.11


The number printed before the IP is the number of times that the IP occurred in IPs.txt.

## Step 3 – Ranking the IP’s

The final step is to sort the lines by the number of times IP occurred in the log file. This is done with the sort command again but this time using the -n option. The -n option will force sort to sort on the number at the beginning and therefore rank the IP by the number of login attempts they made.

Adding sort -n to our command gives us:

cat IPs.txt | sort | uniq -c | sort -n


Which produces the following output:

     1 78.194.172.51
14 116.31.116.48
18 218.65.30.53
70 91.197.232.11
162 218.65.30.123
270 193.201.224.218


If you want the order reversed, i.e. from most to least add the -r option with sort.

Now, we will direct this output to a file for use in Part 3:

cat IPs.txt | sort | uniq -c | sort -n >sorted-IPs.txt


## Conclusion

You can now take a file or indeed any input and rank the contents of that input by the number of times that any line appears. The chained commands

sort | uniq -c | sort -n


is an incredibly useful command line tool that you will use frequently as a systems administrator.

In Part 3, we will create a one line for loop to iterate over the ranked IP’s and run a whois lookup on each one to find out which country they originated from.