Extracting Information From Logs - Part 3

Introduction

In Part 1 and Part 2 of this series, we extracted the IP addresses that made unauthorized SSH login attempts and then sorted and ranked them. Now that we have a ranked list of IP’s we need to find out which countries these IP addresses originated from. The ranked list IP’s that we will use in this section can be downloaded from here.

The problem

The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.

The goal of Part 3

We will produce a ranked list of IP’s along with the country that they originated from. We will generate this list by constructing a simple for loop on a single terminal line that iterates over our list and runs a whois lookup on each IP. We could write a bash script to do this but that takes much longer than constructing a single line loop. Every system administrator should be able to write a single line for loop as it is an invaluable skill for rapidly executing a command repeatedly over some input.

What is a for loop?

A for loop is a way to perform an action repeatedly. Some bash tools like the ones we have already used such as grep and awk accept standard input so we can simply feed them the output from a previous command. However, other commands, like whois do not accept standard input. They must be run and supplied with a specific string like an IP address or domain name as an option. That means that we must repeatedly run a correctly structured command for each of our IP’s. For this, we need to use a for loop.

A for loop when written in a bash script has the structure:

for i in X; do
	command i
done

Where X is the input, such as our list of IP’s, and i is the variable representing each consecutive IP during each iteration of the loop. The for loop above can be re-written so that it only occupies a single terminal line:

for i in X; do command $i; done

We will fill out this skeleton with actual commands to get the result we need.

Here I used i as the variable name. You can use any variable name you like but for a reason no one really knows, the convention is to us i. You will see i used in most examples online so we will stick with the convention here.

Step 1 – Generating valid input

The first thing that we need to do to construct a for loop is to generate input that that can be iterated over. Generally, what is needed is a list with one text string per line. The best way to start is to use the tools we have already used i.e. grep, awk and cut, to get such a list printed to the terminal. We will then use command substitution to use this as the input for the loop.

Our input file, sorted-IPs.txt does not have the correct format to be used unmodified in our for loop because each line has two values e.g.:

   270 193.201.224.218

We can only supply whois with an IP address. Therefore, we need to use the awk command we used in Part 2 to extract second value, i.e. the IP address:

cat sorted-IPs.txt | awk '{print $2}'

When this is run it returns:

78.194.172.51
116.31.116.48
218.65.30.53
91.197.232.11
218.65.30.123
193.201.224.218

Which is exactly what we need for the for loop.

Step 2 – Inserting the command into the for loop

Now that we have a command the produces the input we need we can use this in place of the X in our example for loop by using command substitution:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do command $i; done

When the loop is run bash will substitute $(cat sorted-IPs.txt | awk ‘{print $2}’); with its output, i.e. the list of IP addresses.

At every stage of constructing the loop it is a good idea to check that it is working as we want. The best way to do this is to use the echo command. This command prints a value to the terminal instead of executing them. We will substitute command with echo in the skeleton which will print out the $i variable (an IP address) on each iteration.

Shown here is the command along with the output:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo $i; done
78.194.172.51
116.31.116.48
218.65.30.53
91.197.232.11
218.65.30.123
193.201.224.218

We can now add the whois command along with the echo and check again:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo whois $i; done
whois 78.194.172.51
whois 116.31.116.48
whois 218.65.30.53
whois 91.197.232.11
whois 218.65.30.123
whois 193.201.224.218

We now now that our for loop will run the correct command when we remove the echo.

Step 3 – Using whois and filtering the output

We can remove the echo and the for loop will run whois on each IP address.

The whois output contains many lines of information. However, we only want to find out the country that the IP address originated from.

Fortunately, the whois output includes a country: line e.g.:

country:        UA

We simply need to use grep and awk again to filter out this line and select the country code. Where we test on a single IP rather than using the for loop:

whois IP | grep -m1 country | awk '{print $2}'

The -m1 option instructs grep to only return the first instance it finds. This is necessary because the country line is sometimes repeated several times.

Placing this into our for loop gives us:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do whois $i | grep -m1 country | awk '{print $2}'; done
FR
CN
CN
CZ
CN
UA

This is OK, but the output would be much more useful if the IP was also printed along with the country code.

Step 4 – Adding the IP to the output

We will explore two ways of adding the IP alongside the country code in the output. The first is simpler but less pretty than the second.

The first way to add the IP to the output is to use the fact that we can run as many commands as we want during each iteration of the for loop. Each command we want to run is separated by a semicolon ;. We simply need to add echo $i before the whois command and the IP will get printed and then the country code. This is the for loop along with the output:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo $i; whois $i | grep -m1 country | awk '{print $2}'; done           
78.194.172.51
FR
116.31.116.48
CN
218.65.30.53
CN
91.197.232.11
CZ
218.65.30.123
CN
193.201.224.218
UA

As you can see, the IP and the country code end up consecutive lines. This is OK. The information is all there but it would be much more useful if both the IP and country appeared on the same line. This is possible by utilizing command substitution again.

This can be achieved by placing whois $i | grep -m1 country | awk '{print $2}' inside command expansion brackets and then use echo to print the IP and then results of what is inside the expansion brackets i.e.:

echo “$i $(whois $i | grep -m1 country | awk '{print $2}'”

When this is used in the for loop it gives us:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo "$i $(whois $i | grep -m1 country | awk '{print $2}')"; done
78.194.172.51 FR
116.31.116.48 CN
218.65.30.53 CN
91.197.232.11 CZ
218.65.30.123 CN
193.201.224.218 UA

This is great but not perfect. It would be even prettier if the country codes lined up.

This is not really necessary but also not difficult with all the tools available on the command line. The column command is a tool dedicated to putting information into neat columns. All we have to do is to pipe the entire for loop into column -t and it will format the output neatly.

Here is the entire for loop being piped into column -t along with the output:

for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo "$i $(whois $i | grep -m1 country | awk '{print $2}')"; done | column -t
78.194.172.51    FR
116.31.116.48    CN
218.65.30.53     CN
91.197.232.11    CZ
218.65.30.123    CN
193.201.224.218  UA

Much better. Very often being able to do something on the command line quickly is only a matter of knowing what tools are available.

Conclusion

The objective of this series of guides was to walk you through using bash tools to extract and manipulate data on the command line in order to answer a common question put to system administrators. I hope you noticed that we employed the same tools wherever possible to get the result we needed. When you have become familiar with these tools you will be able to process data rapidly without ever leaving the command line.