Skip to main content

wget website

Whole page:

wget --mirror --convert-links --adjust-extension --page-requisites -e robots=off $WEBSITE
wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.org/tutorials/html/

The options are:

  • --recursive: download the entire Web site
  • --domains website.org: don't follow links outside website.org
  • --no-parent: don't follow links outside the directory tutorials/html/
  • --page-requisites: get all the elements that compose the page (images, CSS and so on)
  • --html-extension: save files with the .html extension
  • --convert-links: convert links so that they work locally, off-line
  • --restrict-file-names=windows: modify filenames so that they will work in Windows as well
  • --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed)
wget -r -np -nH --cut-dirs=2 -R "index.html*" FOLDER


More examples
Mirror whole page:

wget -m http://example.com

downloads things like style sheets, so pages will look correct offline:

--page-requisites

stops wget from downloading parent sites:

--no-parent

Whole sub page too:

 wget --no-parent --page-requisites --convert-links -m http://example.com/subexample

Download entire folder:

wget -r ftp://example.com/folder

-r for recursive

Having a list with all files (a separate line for every url):

wget -i download.txt