How to mirror a website with wget for offline viewing
Save a website to view offline
wget is a Linux utility to mirror a website with wget for offline viewing.A website can be mirrored locally using wget command. By saving locally we can view the website without Internet. wget can make entire copy of an web site locally. wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as “recursive downloading.” While doing that, wget respects the Robot Exclusion Standard (/robots.txt). wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
- -m option tells wget to download all the sites recursively.
- –k option ensures that every link containing into the downloaded pages points to local files. Without -k option if there is a link like http://example.com/page.html and if we view downloaded website offline , http://example.com/page.html will point to http://example.com server. Thus for proper offline view -k option is necessary .
- -p option stands for –page-requisites. It tells wget to download images and css files. Images and css files are required to view page offline properly.
Some important notes:
- When we run “wget -mkp http://example.com ” , it will download the website in current working directory.
- To find current working directory use “pwd” command. pwd prints the path of current working directory.
- To download website into different directory we can specify the path as “wget -mk http://wikistack.com -P /home/xyz/Downloads/ “. -P stands for directory prefix.