Introduction
In Part 1 and Part 2 of this series, we extracted the IP addresses that made unauthorized SSH login attempts and then sorted and ranked them. Now that we have a ranked list of IP’s we need to find out which countries these IP addresses originated from. The ranked list IP’s that we will use in this section can be downloaded from here.
The problem
The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.
The goal of Part 3
We will produce a ranked list of IP’s along with the country that they originated from. We will generate this list by constructing a simple for loop on a single terminal line that iterates over our list and runs a whois
lookup on each IP. We could write a bash script to do this but that takes much longer than constructing a single line loop. Every system administrator should be able to write a single line for loop as it is an invaluable skill for rapidly executing a command repeatedly over some input.
What is a for loop?
A for loop is a way to perform an action repeatedly. Some bash tools like the ones we have already used such as grep
and awk
accept standard input so we can simply feed them the output from a previous command. However, other commands, like whois
do not accept standard input. They must be run and supplied with a specific string like an IP address or domain name as an option. That means that we must repeatedly run a correctly structured command for each of our IP’s. For this, we need to use a for loop.
A for loop when written in a bash script has the structure:
for i in X; do
command i
done
Where X is the input, such as our list of IP’s, and i is the variable representing each consecutive IP during each iteration of the loop. The for loop above can be re-written so that it only occupies a single terminal line:
for i in X; do command $i; done
We will fill out this skeleton with actual commands to get the result we need.
Here I used i as the variable name. You can use any variable name you like but for a reason no one really knows, the convention is to us i. You will see i used in most examples online so we will stick with the convention here.
Step 1 – Generating valid input
The first thing that we need to do to construct a for loop is to generate input that that can be iterated over. Generally, what is needed is a list with one text string per line. The best way to start is to use the tools we have already used i.e. grep
, awk
and cut
, to get such a list printed to the terminal. We will then use command substitution to use this as the input for the loop.
Our input file, sorted-IPs.txt
does not have the correct format to be used unmodified in our for loop because each line has two values e.g.:
270 193.201.224.218
We can only supply whois
with an IP address. Therefore, we need to use the awk
command we used in Part 2 to extract second value, i.e. the IP address:
cat sorted-IPs.txt | awk '{print $2}'
When this is run it returns:
78.194.172.51
116.31.116.48
218.65.30.53
91.197.232.11
218.65.30.123
193.201.224.218
Which is exactly what we need for the for loop.
Step 2 – Inserting the command into the for loop
Now that we have a command the produces the input we need we can use this in place of the X in our example for loop by using command substitution:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do command $i; done
When the loop is run bash
will substitute $(cat sorted-IPs.txt | awk ‘{print $2}’); with its output, i.e. the list of IP addresses.
At every stage of constructing the loop it is a good idea to check that it is working as we want. The best way to do this is to use the echo
command. This command prints a value to the terminal instead of executing them. We will substitute command with echo
in the skeleton which will print out the $i variable (an IP address) on each iteration.
Shown here is the command along with the output:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo $i; done
78.194.172.51
116.31.116.48
218.65.30.53
91.197.232.11
218.65.30.123
193.201.224.218
We can now add the whois
command along with the echo
and check again:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo whois $i; done
whois 78.194.172.51
whois 116.31.116.48
whois 218.65.30.53
whois 91.197.232.11
whois 218.65.30.123
whois 193.201.224.218
We now now that our for loop will run the correct command when we remove the echo
.
Step 3 – Using whois
and filtering the output
We can remove the echo
and the for loop will run whois
on each IP address.
The whois
output contains many lines of information. However, we only want to find out the country that the IP address originated from.
Fortunately, the whois
output includes a country: line e.g.:
country: UA
We simply need to use grep
and awk
again to filter out this line and select the country code. Where we test on a single IP rather than using the for loop:
whois IP | grep -m1 country | awk '{print $2}'
The -m1 option instructs grep to only return the first instance it finds. This is necessary because the country line is sometimes repeated several times.
Placing this into our for loop gives us:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do whois $i | grep -m1 country | awk '{print $2}'; done
FR
CN
CN
CZ
CN
UA
This is OK, but the output would be much more useful if the IP was also printed along with the country code.
Step 4 – Adding the IP to the output
We will explore two ways of adding the IP alongside the country code in the output. The first is simpler but less pretty than the second.
The first way to add the IP to the output is to use the fact that we can run as many commands as we want during each iteration of the for loop. Each command we want to run is separated by a semicolon ;. We simply need to add echo $i
before the whois
command and the IP will get printed and then the country code. This is the for loop along with the output:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo $i; whois $i | grep -m1 country | awk '{print $2}'; done
78.194.172.51
FR
116.31.116.48
CN
218.65.30.53
CN
91.197.232.11
CZ
218.65.30.123
CN
193.201.224.218
UA
As you can see, the IP and the country code end up consecutive lines. This is OK. The information is all there but it would be much more useful if both the IP and country appeared on the same line. This is possible by utilizing command substitution again.
This can be achieved by placing whois $i | grep -m1 country | awk '{print $2}'
inside command expansion brackets and then use echo
to print the IP and then results of what is inside the expansion brackets i.e.:
echo “$i $(whois $i | grep -m1 country | awk '{print $2}'”
When this is used in the for loop it gives us:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo "$i $(whois $i | grep -m1 country | awk '{print $2}')"; done
78.194.172.51 FR
116.31.116.48 CN
218.65.30.53 CN
91.197.232.11 CZ
218.65.30.123 CN
193.201.224.218 UA
This is great but not perfect. It would be even prettier if the country codes lined up.
This is not really necessary but also not difficult with all the tools available on the command line. The column
command is a tool dedicated to putting information into neat columns. All we have to do is to pipe the entire for loop into column -t
and it will format the output neatly.
Here is the entire for loop being piped into column -t
along with the output:
for i in $(cat sorted-IPs.txt | awk '{print $2}'); do echo "$i $(whois $i | grep -m1 country | awk '{print $2}')"; done | column -t
78.194.172.51 FR
116.31.116.48 CN
218.65.30.53 CN
91.197.232.11 CZ
218.65.30.123 CN
193.201.224.218 UA
Much better. Very often being able to do something on the command line quickly is only a matter of knowing what tools are available.
Conclusion
The objective of this series of guides was to walk you through using bash tools to extract and manipulate data on the command line in order to answer a common question put to system administrators. I hope you noticed that we employed the same tools wherever possible to get the result we needed. When you have become familiar with these tools you will be able to process data rapidly without ever leaving the command line.