Peter Jason said:
Is there any software to download from the Internet a list of urls in
a particular folder, and then store the resultant images in that or
another folder. I need this to be automatic to use the off-peak ISP
rates.
Search online for "web crawler" or "web spider". These browse a site
while navigating through all the links in each page to retrieve the
pages for that site and cache them locally. Usually they have a setting
to determine how deep you navigate into a site since a site could have
thousands of pages that you may never see or don't want to see; that is,
you would be wasting time and bandwidth to navigate to and retrieving
highly buried pages that you are not likely to care about.
Some sites don't like these robots crawling around their web sites
because they know a human isn't the one doing the visiting. Search
engines roam through web sites to gather their statistics but there are
also users that use these robots to steal content to reuse elsewhere.
Web sites can use a robots.txt file to tell robots not to roam through
their web site and copy its content but only robots that honor that
request will comply and some robots don't.
http://www.robotstxt.org/
I've seen some sites, knowing that many robots will misbehave and not
honor the request, that will stall or delay the delivery of linked pages
down to a speed expected from that of a human user. That is, they won't
deliver the next page until after a period of time of delivering the
prior page. They throttle their delivery of their web pages. Humans
won't notice the difference but robots certainly will. Think of it like
a foot race between you and a robot. The robot could run a lot faster
but it gets slowed down so it can't run any faster than you. This also
slows down the overhead a web server has to endure when misbehaving
robots attack their site. This means your robot won't be able to
retrieve their pages any faster than you could.
If a search at one of the common or well-known download sites
(download.com, softpedia.com) don't turn up any robots that you like,
the above robots help site has a list of known robots you could look at.
I haven't used a web crawler in well over a decade. My recollection is
that it stored the web site being crawled so that it built up a copy of
the web site in a local folder. Nowadays that may not work as well as
you would like. Streamed media isn't stored on your local hard disk so
you won't have a copy of it (unless you use stream capture software).
That content won't get saved in the local copy of the web site. It will
still be externally linked and you'll have to be online to see it even
when loading the local copies of their web pages. Anything externally
linked at the web site may end up externally linked in your copy of
their web pages. Also, many sites now employ dynamic web pages. Their
scripts (some of which are server-side scripts that you will NEVER be
able to see or to retrieve) decide what content to generate in a web
page on your visit and that content may change depending on when you
visit and how you navigated through their site. So the web crawler
might show different content each time you have it crawl through a site.
Dynamic content, AJAX, and streamed media are becoming much more popular
and prevalent so you'll have to find out how your choice of web crawler
handles those.