Introduction
System administrators are expected to solve many different problems and answer diverse questions throughout their working day. A common request is to answer “How many times did X do Y”. This information is typically not stored directly, and there are no counters or pretty statistics pages for most Linux server applications. However, there are log files. Every time a web page is requested, an email sent or a login made, the event is recorded in a log file. An essential skill for any system administrator is to be to able parse through log files and extract useful information.
The following pages will walk you through extracting useful information from a log file, sorting and counting that information and finally constructing a bash one line for loop to execute a command for each line of the output.
The problem
The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.
The goal of Part 1
We will begin answering this question by parsing through the log file that SSH records all connection attempts to and identify the lines that represent a failed login attempt. We will then use the commands grep
, awk
and cut
to extract only the IP addresses of these attempts and record them to a file. At the end of this section, we will have a list of IP addresses in a file with one IP address per line. Each IP address will represent a login attempt from that IP.
Obtain a log file
SSH logs every successful and unsuccessful login attempt. However, the file that this gets written to is not the same for all distributions. Ubuntu and Debian use the file /var/log/auth.log
and RedHat based distributions use /var/log/secure
.
We will work on a sample of a secure
file from a CentOS server. You can either use your own secure
or auth.log
or you can download a copy of the file we will use for these examples from here.
Homing in on the IP
The easiest way to get to the part of the log entry that you need is to remove everything that is not needed. Start with the most general filter and then keep removing from what is left until only the desired string remains.
This is very easy on the Linux as we can use the pipe |
to keep adding new filters and removing more of what we don’t need.
Step 1 – Filtering the line we need from the log file
The secure
log file contains a great deal of information. This is because all authentication events are written to this file along with CRON events and other information. In addition, each SSH login attempt writes more than one line to the log file and there is more than one type of SSH login attempt that we need to account for.
We will begin by using the grep
command to select only the lines we are interested in from the secure
log file.
The two type unsuccessful SSH login events that we are going to to have to include. These are:
- Invalid users, these are users that do not exist on the server.
- Valid users but incorrect passwords. These are users that exist but an incorrect password was supplied.
We will begin by with the first type. The only way to begin constructing our filter command is to take a look in the log file and see what is actually written. The following excerpt the entire record of a single login attempt:
Aug 23 03:47:13 centos7 sshd[3283]: Invalid user guest from 193.201.224.218
Aug 23 03:47:13 centos7 sshd[3283]: input_userauth_request: invalid user guest [preauth]
Aug 23 03:47:13 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:13 centos7 sshd[3283]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=193.201.224.218
Aug 23 03:47:15 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:16 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:17 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:18 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:20 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:24 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:25 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:26 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:27 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:27 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:29 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:29 centos7 sshd[3283]: Disconnecting: Too many authentication failures for guest [preauth]
What this shows is that IP 193.201.224.218
initiated an SSH connection as user guest. SSH recorded that this user is invalid i.e. does not exist on the server. However, 193.201.224.218
went right on and attempted to supply passwords anyway until it hit the maximum allowed attempts (6) and the SSH server terminated the connection.
Obviously, we don’t want to record 193.201.224.218
for every line in the connection attempt as this will add 16 instances for a single login attempt. We need to choose a line that is present in every login attempt with a non-existent user and only occurs once per connection attempt. The best candidate is the first line:
Aug 23 03:47:13 centos7 sshd[3283]: Invalid user guest from 193.201.224.218
Now that we have our candidate line we now need to choose some text that is unique to this line to use with grep
. The string Invalid user looks like a good fit so let’s test it grep
:
grep “Invalid user” secure
This gives us an output that looks like:
Aug 23 03:52:40 centos7 sshd[5160]: Invalid user acc from 193.201.224.218
Aug 23 03:52:45 centos7 sshd[5243]: Invalid user adam from 193.201.224.218
Aug 23 03:52:53 centos7 sshd[5312]: Invalid user adfexc from 193.201.224.218
Aug 23 03:53:45 centos7 sshd[5494]: Invalid user admin2 from 193.201.224.218
etc
This is exactly what we need. We have a line that only appears once for each unsuccessful attempt, contains that IP of the attacker and doesn’t match any other lines in the log file.
We can now do the same for the second type of login, one with a valid username but incorrect password. The following excerpt shows one such attempt:
Aug 23 05:08:08 centos7 sshd[5185]: reverse mapping checking getaddrinfo for 123.30.65.218.broad.xy.jx.dynamic.163data.com.cn [218.65.30.123] failed - POSSIBLE BREAK-IN ATTEMPT!
Aug 23 05:08:09 centos7 sshd[5185]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
Aug 23 05:08:09 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:10 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:11 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:13 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:14 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:16 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:16 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:18 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:19 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:21 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:22 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:24 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:24 centos7 sshd[5185]: Disconnecting: Too many authentication failures for root [preauth]
Again, we need to look for a line that appears once in this type of SSH connection. This line is a good candidate:
Aug 23 05:08:09 centos7 sshd[5185]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
The text that we will use for grep
will be pam_unix(sshd:auth): authentication failure. This gives us the grep
command:
grep "pam_unix(sshd:auth): authentication failure" secure
This produces the following output:
Aug 23 05:04:56 centos7 sshd[5153]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
Aug 23 05:05:18 centos7 sshd[5160]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
Aug 23 05:05:42 centos7 sshd[5164]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
Aug 23 05:06:06 centos7 sshd[5168]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
etc
This is exactly the output that we need.
Now that we have our two grep
commands we can build on them to extract only the IP addresses from the filtered lines.
Step 2 – Extracting a text string from a line of text
The only information that we need from each of the lines that we obtained above is the IP address. Removing the extraneous text is a simple matter using standard Linux text manipulation tools. These programs and they way they are used are incredibly important tools in any system administrators tool box.
Firstly, we will extract the IP from the invalid user lines. We will work on this line:
Aug 23 03:53:58 centos7 sshd[5501]: Invalid user agent from 193.201.224.218
The first thing to notice is that each piece of text in the line is separated by white spaces and each line has the same number of pieces of text. This makes it easy to separate the strings and select the string we want.
The two most common tools to do this are cut
and awk
. cut
can be a little tricky when the fields separator (or delimiter) occurs more than once between strings. This is especially true of white spaces. This makes awk
a better choice when separating on spaces as its default behaviour is to use any number of white spaces as the delimiter.
awk
is a very complicated command but most system administrators only use it in the following way:
awk ‘{print $N}’
Where $N is the number of the string in the line counting from left to right.
In our log lines, the IP address is text string number 10. This gives us the awk
command:
awk '{print $10}'
It is possible to test this by running the grep
command over the whole log file with the awk
filter appended with a |
.
However, this is impractical for very long log files. An easier approach is to take a single line and echo
it. This allows us to work on the filter without having to grep through the entire log file every time.
Using echo
and our awk
filter gives us the following command. I have included the output in the second line:
echo “Aug 23 03:53:58 centos7 sshd[5501]: Invalid user agent from 193.201.224.218” | awk ‘{print $10}’
193.201.224.218
We have successfully extracted the IP address from rest of the line. Now that we can get the IP we can append the awk
command to the first grep
command to do the same for every matching line in the log file. We will also direct the output to a file (IPs.tx) to be used in Part 2.
This gives us the following line:
grep "Invalid user" secure | awk '{print $10}' >IPs.txt
Now, we will do the same for the valid user lines. The example line we will work on is as follows:
Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root
As you can see this is going to require a little more work because the IP address is joined to some text i.e. rhost=218.65.30.123.
The first job is to separate the IP and text using the same awk
command we used for the invalid user. This gives us the following along with the output:
echo "Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.6
5.30.123 user=root" | awk '{print $14}'
rhost=218.65.30.123
Next, we need to separate rhost= from 218.65.30.123.
The text that we want to separate, rhost=218.65.30.123, is best thought of as two text strings separated by an = sign. This allows us to get the IP by telling the cut
command to use the = as the separating character and return the second piece of text.
This gives us the following:
cut -d”=” -f2
The -d option is the “delimiter” or the character that splits the text strings. And the -f is the “field” or text string number counting from left to right. When this is added after the awk
command we get:
echo "Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123 user=root" | awk '{print $14}' | cut -d"=" -f2
218.65.30.123
This command successfully yields only the IP address. We can now add the awk
and cut
filters after the grep
command to extract the IPs of all the valid user login attempts. We will also append all of these IPs we get to the end of the IPs.txt
file:
grep 'pam_unix(sshd:auth): authentication failure;' secure | awk '{print $14}' | cut -d"=" -f2 >>IPs.txt
Conclusion
We now have a file which contains an IP address for every unsuccessful login attempt on a new line. In Part 2 will look at how to sort and count these IP addresses so that we can find out how often they attempted to log in and rank them in order of the number of attempts they made.