Reading And Searching Compressed Files

This guide explains how to read and search through compressed files on the Linux bash command line without decompressing them first.

gzip and bzip2 are the most common compression utilities used on Linux and are used pretty much anywhere compression is required. This is especially true of log files because the logrotate utility is usually configured by default to run when a server such as Apache2 is installed. logrotate periodically takes the current working log file, gives it a version number and compresses it leaving a new, empty log file for the application to write to.

This is great for stopping your disk filling up. However, compressed files are not readable or searchable by the usual command line tools such as less or grep. Obviously, one could simply decompress all the files and then work on them as normal. However, for a busy web or email server, this can be a very large amount of data and is an additional step that is not required because both gzip and bzip2 provide equivalents of less, cat and grep to work directly with the compressed data.

gzip - .gz

zless
zcat
zgrep

bzip2 - .bz2

bzless
bzcat
bzgrep

These command work in exactly the same way as their standard counterparts. For example, if I need to search through all of my rotated dpkg logs (that have the form dpkg.log.N.gz) for the package apache2 I can run the command:

zgrep apache2 dpkg.log.*.gz

Reading the file dpkg.log.5.gz:

zless dpkg.log.5.gz

If you need to use command line tools that require uncompressed text as stdin such as tail then simply pipe the output of zcat into them e.g.:

zcat dpkg.log.5.gz | tail

With these tools, you can quickly find the information you need without having to unpack the files first.