A large part of using a shell is manipulating text strings using tools such as
cut. Systems Administrators often have to take a file as input with an unknown format and extract only the text strings they need.
This is usually fairly simple - until it’s not. The situation is complicated because shells will interpret certain characters in strings and display something other than the literal characters. The two most common are
\n for a new line (think a carriage return) and
\t for a tabbed space.
This can complicate some operations as a
\t will be displayed exactly like four blank spaces when viewed with
cat, but will affect how commands like
cut will work.
It is, therefore, sometimes necessary to see exactly what some text contains.
This is easily done with the command
od command will “dump” or display input data into a different human-readable output format. Any special characters such as
\t will be printed as
\t and will not get interrpreted and displayed as a space.
od is very simple to use, it will either accept the standard output from a command or input redirection e.g.:
cat data.txt | od -tc od -tc <data.txt
The options that we use here are:
-tSet what the output format will be.
-cSet the output to display printable and backslashed (e.g.
An extract from Google’s
<url> <loc>https://edu.google.com/components/</loc> </url>
And an extract from the Wordpress
<url> <loc>http://wordpress.example.com/2009/05/</loc> </url>
They both look similar with only the spacing and alignment different. However, it turns out that they are quite differently formatted.
Here is what the output of
od -tc on the two examples looks like:
od -tc <google-sitemap.xml 0000000 < u r l > \n 0000020 < l o c > h t t p s : / / e 0000040 d u . g o o g l e . c o m / c o 0000060 m p o n e n t s / < / l o c > \n 0000100 < / u r l > \n 0000114
od -tc <wordpress-sitemap.xml 0000000 \t < u r l > \n \t \t < l o c > h t 0000020 t p : / / w o r d p r e s s . p 0000040 i n k t u x e d o . n e t / 2 0 0000060 0 9 / 0 5 / < / l o c > \n \t < / 0000100 u r l > \n 0000105
Google has used white spaces (which are represented by blank spaces of
od) and Wordpress uses tabs (shown as a
‘od’ will also quickly reveal if a text file was created on a Windows machine and uses the Windows syle
\r new line characters which causes issues on Linux machines e.g.:
od -tc <Hello-World.txt 0000000 H e l l o W o r l d ! \r 0000016
od command allows us to quickly find out exactly how a file is formatted and modify any scripts or commands to work with it.