Wayback Machine Downloader - Download sites for free
- EN
- RU
Table of Contents
Once needed to download a site from web.archive.org. Archivarix.com and r-tools.org are unnecessary as they are the same Wayback Machine Downloader, only paid and only work through the web. In 2024. Some readers tried to accuse me of bias on the grounds that Archivarix is the best tool for restoring sites. The arguments were all in favor of Archivarix’s cms. And those about it being comfortable. Make a separate frontend for editing html files is best, console is too complicated.
I took the complicated route, go with me.
Wayback Machine Downloader
To download sites from the Webarchive for free let’s use the free console utility Wayback Machine Downloader. I installed it on macOS, so this manual will be for macOS as well.
Installing Wayback Machine Downloader
Open terminal and type the command:
sudo gem install wayback_machine_downloader
How to download a website from the Webarchive for free
After installing Wayback Machine Downloader, enter the command:
wayback_machine_downloader http://example.com
Where http://example.com is the site you want to download.
Parameters for downloading
- -d, –directory PATH: The category to save the downloaded files. The default is ~/websites/ plus the domain name;
- -s, –all-timestamps: Download all snapshots for a given site;
- -f, –from TIMESTAMP: Download only files at or after the specified point in time (e.g. 20060716231334);
- -t, –to TIMESTAMP: Download only files at or before the specified timestamp (e.g. 20100916231334);
- -e, –exact-url: Download only the specified url, not the full site;
- -o, –only ONLY_FILTER: Restrict downloading to only those addresses that match the given filter. (use notation // to treat the filter as a regex);
- -x, –exclude EXCLUDE_FILTER: Skip loading references matching this filter (use // notation for the filter to treat it as regex);
- -a, –all: Download error (40x and 50x) and redirect (30x) files;
- -c, –concurrency NUMBER: How many threads to download the site in (default is 1 thread);
- -p, –maximum-snapshot NUMBER: Maximum number of snapshots (default is 100);
- -l, –list: Output a list of file addresses in JSON format with archived timestamps, without downloading anything;
- -v, –version: Show the version of the Wayback Machine Downloader.
Questions in files
Everything is perfect, except that files sometimes have get-requests in the header. It was style.css, now it’s style.css?ver=666, and you have errors in the console. To fix it, run this command while in the directory with the merged site.
find . -type f -name '*\?*' -exec sh -c 'mv "$0" "${0%%\?*}"' {} \;
I’m not working/errors, what do I do?
I don’t know. And I won’t consult you by mail, don’t even wait for a reply.