How To Get A List Of URLs From A Website

A fairly frequent job of a sysadmin is to do some benchmarking of a website. The first requirement of this is that you have some URLs to give to the benchmarking tool.

I have found that getting a list of all the site’s URLs for assets like HTML pages, images, CSS files etc makes the test a bit more representative.

The following command will very quickly spider a website and generate a text file of all the URLs that it finds:

wget --spider --force-html -r -l5 https://<DOMAIN-NAME> 2>&1 | grep '^--' | awk '{print $3}' > urls.txt

All you need to do is to change <DOMAIN-NAME> to your site’s name and you’re good to go.

Running it on this website give a list of 316 URLs and takes around 20 seconds.