Extracting Information From Logs - Part 1

Introduction

System administrators are expected to solve many different problems and answer diverse questions throughout their working day. A common request is to answer “How many times did X do Y”. This information is typically not stored directly, and there are no counters or pretty statistics pages for most Linux server applications. However, there are log files. Every time a web page is requested, an email sent or a login made, the event is recorded in a log file. An essential skill for any system administrator is to be to able parse through log files and extract useful information.

The following pages will walk you through extracting useful information from a log file, sorting and counting that information and finally constructing a bash one line for loop to execute a command for each line of the output.

The problem

The scenario that we will work on in these guides is a common question faced by system administrators. We will imagine that we have been asked to find out what countries unauthorized SSH login attempts are coming from and rank them according to the frequency of their attacks.

The goal of Part 1

We will begin answering this question by parsing through the log file that SSH records all connection attempts to and identify the lines that represent a failed login attempt. We will then use the commands grep, awk and cut to extract only the IP addresses of these attempts and record them to a file. At the end of this section, we will have a list of IP addresses in a file with one IP address per line. Each IP address will represent a login attempt from that IP.

Obtain a log file

SSH logs every successful and unsuccessful login attempt. However, the file that this gets written to is not the same for all distributions. Ubuntu and Debian use the file /var/log/auth.log and RedHat based distributions use /var/log/secure.

We will work on a sample of a secure file from a CentOS server. You can either use your own secure or auth.log or you can download a copy of the file we will use for these examples from here.

Homing in on the IP

The easiest way to get to the part of the log entry that you need is to remove everything that is not needed. Start with the most general filter and then keep removing from what is left until only the desired string remains.

This is very easy on the Linux as we can use the pipe | to keep adding new filters and removing more of what we don’t need.

Step 1 – Filtering the line we need from the log file

The secure log file contains a great deal of information. This is because all authentication events are written to this file along with CRON events and other information. In addition, each SSH login attempt writes more than one line to the log file and there is more than one type of SSH login attempt that we need to account for.

We will begin by using the grep command to select only the lines we are interested in from the secure log file.

The two type unsuccessful SSH login events that we are going to to have to include. These are:

  1. Invalid users, these are users that do not exist on the server.
  2. Valid users but incorrect passwords. These are users that exist but an incorrect password was supplied.

We will begin by with the first type. The only way to begin constructing our filter command is to take a look in the log file and see what is actually written. The following excerpt the entire record of a single login attempt:

Aug 23 03:47:13 centos7 sshd[3283]: Invalid user guest from 193.201.224.218
Aug 23 03:47:13 centos7 sshd[3283]: input_userauth_request: invalid user guest [preauth]
Aug 23 03:47:13 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:13 centos7 sshd[3283]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=193.201.224.218
Aug 23 03:47:15 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:16 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:17 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:18 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:20 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:24 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:25 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:26 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:27 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:27 centos7 sshd[3283]: pam_unix(sshd:auth): check pass; user unknown
Aug 23 03:47:29 centos7 sshd[3283]: Failed password for invalid user guest from 193.201.224.218 port 13181 ssh2
Aug 23 03:47:29 centos7 sshd[3283]: Disconnecting: Too many authentication failures for guest [preauth]

What this shows is that IP 193.201.224.218 initiated an SSH connection as user guest. SSH recorded that this user is invalid i.e. does not exist on the server. However, 193.201.224.218 went right on and attempted to supply passwords anyway until it hit the maximum allowed attempts (6) and the SSH server terminated the connection.

Obviously, we don’t want to record 193.201.224.218 for every line in the connection attempt as this will add 16 instances for a single login attempt. We need to choose a line that is present in every login attempt with a non-existent user and only occurs once per connection attempt. The best candidate is the first line:

Aug 23 03:47:13 centos7 sshd[3283]: Invalid user guest from 193.201.224.218

Now that we have our candidate line we now need to choose some text that is unique to this line to use with grep. The string Invalid user looks like a good fit so let’s test it grep:

grep “Invalid user” secure

This gives us an output that looks like:

Aug 23 03:52:40 centos7 sshd[5160]: Invalid user acc from 193.201.224.218
Aug 23 03:52:45 centos7 sshd[5243]: Invalid user adam from 193.201.224.218
Aug 23 03:52:53 centos7 sshd[5312]: Invalid user adfexc from 193.201.224.218
Aug 23 03:53:45 centos7 sshd[5494]: Invalid user admin2 from 193.201.224.218
etc

This is exactly what we need. We have a line that only appears once for each unsuccessful attempt, contains that IP of the attacker and doesn’t match any other lines in the log file.

We can now do the same for the second type of login, one with a valid username but incorrect password. The following excerpt shows one such attempt:

Aug 23 05:08:08 centos7 sshd[5185]: reverse mapping checking getaddrinfo for 123.30.65.218.broad.xy.jx.dynamic.163data.com.cn [218.65.30.123] failed - POSSIBLE BREAK-IN ATTEMPT!
Aug 23 05:08:09 centos7 sshd[5185]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root
Aug 23 05:08:09 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:10 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:11 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:13 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:14 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:16 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:16 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:18 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:19 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:21 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:22 centos7 sshd[5185]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root"
Aug 23 05:08:24 centos7 sshd[5185]: Failed password for root from 218.65.30.123 port 42034 ssh2
Aug 23 05:08:24 centos7 sshd[5185]: Disconnecting: Too many authentication failures for root [preauth]

Again, we need to look for a line that appears once in this type of SSH connection. This line is a good candidate:

Aug 23 05:08:09 centos7 sshd[5185]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root

The text that we will use for grep will be pam_unix(sshd:auth): authentication failure. This gives us the grep command:

grep "pam_unix(sshd:auth): authentication failure" secure

This produces the following output:

Aug 23 05:04:56 centos7 sshd[5153]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root
Aug 23 05:05:18 centos7 sshd[5160]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root
Aug 23 05:05:42 centos7 sshd[5164]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root
Aug 23 05:06:06 centos7 sshd[5168]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root
etc

This is exactly the output that we need.

Now that we have our two grep commands we can build on them to extract only the IP addresses from the filtered lines.

Step 2 – Extracting a text string from a line of text

The only information that we need from each of the lines that we obtained above is the IP address. Removing the extraneous text is a simple matter using standard Linux text manipulation tools. These programs and they way they are used are incredibly important tools in any system administrators tool box.

Firstly, we will extract the IP from the invalid user lines. We will work on this line:

Aug 23 03:53:58 centos7 sshd[5501]: Invalid user agent from 193.201.224.218

The first thing to notice is that each piece of text in the line is separated by white spaces and each line has the same number of pieces of text. This makes it easy to separate the strings and select the string we want.

The two most common tools to do this are cut and awk. cut can be a little tricky when the fields separator (or delimiter) occurs more than once between strings. This is especially true of white spaces. This makes awk a better choice when separating on spaces as its default behaviour is to use any number of white spaces as the delimiter.

awk is a very complicated command but most system administrators only use it in the following way:

awk ‘{print $N}’

Where $N is the number of the string in the line counting from left to right.

In our log lines, the IP address is text string number 10. This gives us the awk command:

awk '{print $10}'

It is possible to test this by running the grep command over the whole log file with the awk filter appended with a |.

However, this is impractical for very long log files. An easier approach is to take a single line and echo it. This allows us to work on the filter without having to grep through the entire log file every time.

Using echo and our awk filter gives us the following command. I have included the output in the second line:

echo “Aug 23 03:53:58 centos7 sshd[5501]: Invalid user agent from 193.201.224.218” | awk ‘{print $10}’
193.201.224.218

We have successfully extracted the IP address from rest of the line. Now that we can get the IP we can append the awk command to the first grep command to do the same for every matching line in the log file. We will also direct the output to a file (IPs.tx) to be used in Part 2.

This gives us the following line:

grep "Invalid user" secure | awk '{print $10}' >IPs.txt

Now, we will do the same for the valid user lines. The example line we will work on is as follows:

Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root

As you can see this is going to require a little more work because the IP address is joined to some text i.e. rhost=218.65.30.123.

The first job is to separate the IP and text using the same awk command we used for the invalid user. This gives us the following along with the output:

echo "Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.6
5.30.123  user=root" | awk '{print $14}'
rhost=218.65.30.123

Next, we need to separate rhost= from 218.65.30.123.

The text that we want to separate, rhost=218.65.30.123, is best thought of as two text strings separated by an = sign. This allows us to get the IP by telling the cut command to use the = as the separating character and return the second piece of text.

This gives us the following:

cut -d”=” -f2

The -d option is the “delimiter” or the character that splits the text strings. And the -f is the “field” or text string number counting from left to right. When this is added after the awk command we get:

echo "Aug 23 05:08:33 centos7 sshd[5187]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.123  user=root" | awk '{print $14}' | cut -d"=" -f2
218.65.30.123

This command successfully yields only the IP address. We can now add the awk and cut filters after the grep command to extract the IPs of all the valid user login attempts. We will also append all of these IPs we get to the end of the IPs.txt file:

grep 'pam_unix(sshd:auth): authentication failure;' secure | awk '{print $14}' | cut -d"=" -f2 >>IPs.txt

Conclusion

We now have a file which contains an IP address for every unsuccessful login attempt on a new line. In Part 2 will look at how to sort and count these IP addresses so that we can find out how often they attempted to log in and rank them in order of the number of attempts they made.