WgetRexx

This manual describes WgetRexx, version 2.4 (20-Jun-2000).

WgetRexx is an ARexx script to invoke Wget from your favourite Amiga WWW browser. It is designed in a way that you can easily maintain a copy of the interesting parts of the WWW on your local hard drive.

[Spinnakinni] This allows you to browse them without having to start your network, without having to worry if documents are still in the cache of the browser, without having to figure out the cryptic name of a certain document in your cache directory and without having to deal with slow, overloaded or/and unstable connections.

WgetRexx is © Thomas Aglassinger 1998-2000 and is copyrighted freeware. Therefor you are allowed to use it without paying and can also modify it to fit your own needs. You can redistribute it as long as the whole archive and its contents remain unchanged.

The current version of this document and the program should be available from http://www.giga.or.at/~agi/wgetrexx/.

Contents
Overview
Legal Issues
Requirements
Installation
First Steps
Example Configuration
Usage of the Example Configuration
Command Line Options
Outsourcing Macros
Generating an Index for Web:
Troubleshooting
Updates and Support
History

Overview

For those going to refuse to read this whole manual, here are the interesting parts: the requirements tell you where you can get needed stuff from, an example configuration shows how your macro menu might look like (and should terrify those away who don't know how to use Wget), a chapter about troubleshooting describes some common problems, and there are also some notes on updates and support.

Legal Issues

Disclaimer

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Copyright

No program, document, data file or source code from this software package, neither in whole nor in part, may be included or used in other software packages unless it is authorized by a written permission from the author.

No Warranty

There is no warranty for this software package. Although the author has tried to prevent errors, he can't guarantee that the software package described in this document is 100% reliable. You are therefore using this material at your own risk. The author cannot be made responsible for any damage which is caused by using this software package.

Distribution

This software package is freely distributable. It may be put on any media which is used for the distribution of free software, like Public Domain disk collections, CDROMs, FTP servers or bulletin board systems.

The author cannot be made responsible if this software package has become unusable due to modifications of the archive contents or of the archive file itself.

There is no limit on the costs of the distribution, e.g. for the media, like floppy disks, streamer tapes or compact disks, or the process of duplicating.

Other Material

For more details about these packages and where to obtain them from see below.

Copyright of Web Sites

Note that unless explicitly stated to the contrary, the copyright of all files on the WWW is held by the owners of the appropriate site.

If you intend to redistribute any files downloaded from the WWW please ensure that you have the permission of the copyright holder to do so.

Requirements

You will need a WWW-browser with a reasonable ARexx-port. AWeb and IBrowse meet this criteria. Voyager does not allow to query the URI currently browsing by means of ARexx.

Of course you will need Wget. As it is distributed under the terms of the GNU General Public License, you can obtain its source code from ftp://prep.ai.mit.edu/pub. A compiled binary for AmigaOS is part of the Geek Gadgets and available from ftp://ftp.ninemoons.com/pub/geekgadgets/. You can also download the binary from aminet:dev/gg/wget-bin.lha.

You need reqtools.library, rexxreqtools.library and rexxdossupport.library to be installed in your libs: drawer. They are not included with the WgetRexx archive, but you can obtain them from aminet:util/libs/ReqToolsUsr.lha and aminet:util/rexx/rexxdossupport.lha.

As these libraries are very common, they are maybe already installed on your system. Simply check your sys:libs drawer to validate this.

Installation

Installing `Wget`

The most difficult part of the installation by far is to make Wget work. As Wget is a tool coming from the Unix-world, it is completely unusable, has 873265 command line options and a very short manual with only a few examples.

If you do not know how to install and use it, this script won't make it easier. It just makes it more efficient to invoke while browsing.

Recursive web sucking is a very powerful feature. However, if used in an improper way, it can cause harm to the network traffic, and therefore should only be used by experienced people. Successfully installing Wget on an Amiga is one requirement to call yourself experienced.

Creating Your Local Web

Before you can use this script, you have to decide where your local web should reside and create a directory for it. After that you have to create an assign named Web: to this directory.

For example, if your local web should be in Work:Web, create a drawer called Web from the Workbench or enter

makedir Work:Web

into CLI. Then add the following line to your s:user-startup:

assign Web: Work:Web

Installing `Wget.rexx`

Now you can copy the script where ever you like. As it does not depend on a certain browser, it can make sense to store it in rexx:. This spares you to specify a full path when invoking rx as it will automatically look there for it.

All you have to do now is to assign them to the ARexx macro menu by changing the preferences. See the example configuration for a few suggestions.

First Steps

Although the script will later on be started from your browser, you should first try if everything works. For that, open a CLI, and type:

cd ram:
wget http://www.cucug.org/amiga.html

This should invoke Wget and download the title page of the popular Amiga Web Directory and store it in ram:amiga.html.

If it does not, Wget does not work for you, and you can forget about the rest.

Assuming you've been lucky, now try this:

wget -x http://www.cucug.org/amiga.html

This downloads the document again, but now also creates a directory for the server structure. Now you can find the copy in ram:www.cucug.org/amiga.html.

And another step further:

wget -x --directory-prefix=/Web/ http://www.cucug.org/amiga.html

This time, the document ends up in Web:www.cucug.org/amiga.html, even if the current directory still is ram:.

Now finally let's see what Wget.rexx can do for you. Start your browser, go to http://www.cucug.org/ and wait until it displays the page. Switch to the CLI and enter:

rx wget.rexx

This downloads the document you are currently viewing in your browser to Web: and automatically displays the local copy after finishing. If everything worked fine, your browser should show file://localhost/Web:www.cucug.org/amiga.html.

And all of this is done by communicating with the browser via ARexx and then invoking Wget the same was as you saw before.

To be more specific, the script queries the URI your browser currently is displaying, tells Wget to download the stuff from there to your hard drive and put it into a reasonable place. Finally the browser is made to load the local copy.

Also many possible errors are handled by the scripts before Wget is launched, and they usually result in a requester with a descriptive message. However, once Wget is started, you are to the mercy of its non-existent or esoteric error messages. The script can only warn about errors in general if Wget returned with a non-zero exit code (which it does not for all possible errors). In such a case, analyse the output in the console for problems.

In this first few examples, only one file was downloaded. Fortunately, the underlying tool Wget can do more. With proper command line options, it can download several documents, scan them for links and images and continue downloading them recursively.

The only catch about this is: you have to know how to work with Wget. There are some examples macros here and there are several interesting command line options mentions here, but this is not enough to really know what's going on. Nearly every site needs its own command line options to get the useful part - and prevent the useless stuff to be downloaded. Very often you will have to interrupt the process and resume with more proper options.

There are some hints about this here, but: read the manual for Wget. There is no way around this.

Example Configuration

Below you will find some example macros you can assign to the ARexx menu of your browser. As there are small differences between the various browsers, there exists a configuration for every one of them.

It is recommended to use the clipboard to copy the macro texts into the preferences requester.

Read also the about the usage of the example configuration to find out how it works. For further information about passing arguments to the script, see command line options.

IBrowse

To make the script accessible from the Rexx menu, go to IBrowse's Preferences/General/Rexx and add the following entries:

Name	Macro
Copy single resource	wget.rexx
Copy page with images	wget.rexx --recursive --level=1 --accept=png,jpg,jpeg,gif --convert-links
Copy site	wget.rexx --recursive --no-parent --reject=wav,au,aiff,aif --convert-links
Copy site...	wget.rexx Ask --recursive --no-parent --reject=wav,au,aiff,aif --convert-links
Copy generic...	wget.rexx Ask

This assumes that the script has been installed to rexx: or the the browser directory. If this is not the case, you have to specify the full path name.

To change the default console the script will send its output to, modify Preferences/General/General/Output Window.

AWeb

To make the script accessible from the ARexx menu, go to AWeb's Settings/GUI Settings/ARexx and add the following entries:

Name	Macro
Copy single resource	wget.rexx >CON://640/200/Wget/CLOSE/WAIT
Copy page with images	wget.rexx >CON://640/200/Wget/CLOSE/WAIT --recursive --level=1 --accept=png,jpg,jpeg,gif --convert-links
Copy site	wget.rexx >CON://640/200/Wget/CLOSE/WAIT --recursive --no-parent --reject=wav,au,aiff,aif --convert-links
Copy site...	wget.rexx >CON://640/200/Wget/CLOSE/WAIT Ask --recursive --no-parent --reject=wav,au,aiff,aif --convert-links
Copy generic...	wget.rexx >CON://640/200/Wget/CLOSE/WAIT Ask

This assumes that the script has been installed to rexx: or the the browser directory. If this is not the case, you have to specify the full path name.

Note that you have to redirect the output of every script to a console. Otherwise you would not be able to see what Wget is currently doing. Therefor this looks a bit more confusing than the example for IBrowse.

See also the problems with AWeb for some further notes.

Usage of the Example Configuration

Here is a short description of the macros used for the example configuration.

Copy single resource

This is comparable to the download function of your browser. The difference is that it will automatically be placed in your local web in a directory depending on the location where it came from.

For example, you could point your browser to http://www.cucug.org/aminew.html and would get a single file named aminew.html being stored in the directory Web:www.cucug.org/. If such directory does not yet exist, it will be created automatically. This is the same as if you would have typed

wget -x --directory-prefix=/Web/ http://www.cucug.org/aminew.html

in the CLI. The difference is that you did not have to type a single letter.

Copy page with images

This now retrieves a page with all its images. Of course it only makes sense to call is when actually viewing a HTML-page. With other types of data like images it will act same as Copy single resource.

After this operation, inline images will still work in the local copy of the downloaded page.

Copy site

This is a very powerful macro that will copy a whole site to your local web. It starts at the document currently viewing, and will download all pages, images and some other data within the same or a deeper directory level.

The option --reject has been specified to refuse to download often unwanted music and sound data. You might want to specify further extensions here, so also movies, archives and printable documents are skipped, for example: --reject=mpg,mpeg,wav,au,aiff,aif,tgz,Z,gz,zip,lha,lzx,ps,pdf.

Copy site...

Same as before, but because of the Ask it will pop-up a requester where you can change the options for Wget before it is actually invoked. For example, you can modify the --reject that it will not refuse sound data because you once in a while want to download from a music site.

You can also add additional options like --no-clobber to continue an aborted Copy site from before or --exclude-directories because you known that there is only crap in /poetry/.

Copy generic...

This will simply pop-up a requester where you can enter options for Wget. Except for the internal -x --directory-prefix nothing is specified yet. It is useful when occasionally none of the above methods is flexible enough.

Command Line Options

As you just learned, it is possible to pass additional options to Wget.rexx. There are two different kinds of them:

Options handled by the script to influence its behavior. They are described here.
Options for Wget to tell what to download how. These are covered in the documentation included with Wget.

The complete ReadArgs() template for Wget.rexx is:

To/K,Ask/S,Further/S,Port/K,Continue/S,Clip/K,Verbose/S,Screen/K,Options/F

In most cases you will not need to specify any options except those for Wget.

Get Some Details

If you enable Verbose, Wget.rexx will tell you some details what is going on:

The version number of the program
The name of the ARexx port it is talking to
From where it is going to download
Where the file to be shown after a successful download will be located

Note that this does not influence the output of Wget itself.

Ask for Options

One might not always be satisfied with a few standard macros and would like to pass different options to Wget. On the other hand it does not make sense to clutter the ARexx menu of your browser with loads of only slightly different macros. Invoking the script like

wget.rexx Ask

will pop-up a requester where you can enter options to be passed to Wget. If you already passed other Wget options via command line, the requester will allow you to edit them before starting Wget:

wget.rexx Ask --recursive --no-parent

will bring a requester where the text "--recursive --no-parent" is already available in the input field and can be extended or reduced by the user. Sometimes it may be more convenient to use

wget.rexx Ask Further --recursive --no-parent

This also brings up the requester, but this time the input field is empty. The options already specified in the command line will be passed in any case, and you can only enter additional options. If for example you now enter --reject=mpeg this would be the same as if you would have called

wget.rexx --recursive --no-parent --reject=mpeg

The main advantage is that the input field is not already cluttered with loads of confusing options. The drawback is that you can not edit or remove options already passed from the command line.

Messing with Screens

If the browser runs on an own screen, it is possible that the Ask requester will still pop-up on the Workbench. You can avoid this by passing the name of the screen manually, for example:

wget.rexx Ask Screen="IBrowse Screen"

(This is a misfeature resulting from the fact that RexxReqTools cannot figure the screen of the console started from the browser because the task's pr_WindowPtr() seems to get lost somewhere. Normally, the requester would open on the browser screen automatically.)

However, Screen only affects the requester. If you want to redirect the console where Wget prints its progress information, you have to do so by giving a screen in the console specification. For example,

"CON://500/200/Wget/CLOSE/WAIT/SCREENIBrowse Screen"

Generally, you can specify the frontmost screen with "*". Usually, this is what you want if you start Wget.rexx from the browser:

CON://500/200/Wget/CLOSE/WAIT/SCREEN*

If you want to use blanks in the window title, you have quote the console specification. This rises a little problem, because "*" is the CLI escape character. To actaully get a "*", you have to use "**" (Read the AmigaDOS manual for more details on this possibly confusing issue):

"CON://500/200/Wget Output/CLOSE/WAIT/SCREEN**"

Thus, an example macro for your browser might look like:

Name	Macro
Copy site..	wget.rexx Ask Screen="IBrowse Screen" --recursive --no-parent >"CON://500/200/Wget/CLOSE/WAIT/SCREEN**"

Specify Options for `Wget`

The last part of the command line can contain additional options to be passed to Wget.

Important: You must not pass any Wget.rexx specific options after the first option for Wget. For example,

wget.rexx Ask --quiet

tells Wget.rexx to pop-up the options requester and Wget to not display download information. But on the other hand,

wget.rexx --quiet Ask

will pass both --quiet Ask to Wget, which of course does not really know what to do about the Ask.

Specify a Different Download Directory

If you do not want the downloaded data to end up in Web:, you can use To to specify a different download directory. For example:

wget.rexx To=ram:t

The value denotes a path in AmigaOS format. Internally it will be converted to the ixemul-style before it is passed to Wget. Fortunately you do not have to know anything about that.

With a little CLI magic, you can let a file requester ask you for the download directory:

wget.rexx To=`RequestFile DrawersOnly SaveMode NoIcons Title="Select Download Directory"`

Note that the quote after To= is a back-quote, not a single quote. You can find it below Esc on your keyboard.

Select the Browser ARexx Port

Normally you do not want to do this because Wget.rexx figures out the ARexx port to use by itself.

First it assumes that the host it was started from was a browser. In such a case, it will continue to talk to this host no matter how many other browsers are running at the same time.

If this turns out to be wrong (e.g. because it was started in the CLI), it tries to find one of the supported browsers at its default port. If any such browser is running, it will use this.

If no browser is running, the script does not work at all for apparent reasons.

The only possible problem can be that several supported browsers are running at the same time, and you do not start the script directly from one of them. In such a rare case the browser checked first will be used, which is not necessarily the one the user prefers. Therefore you can use

wget.rexx Port=IBROWSE

in CLI, even if AWeb is also running.

Continue an Interrupted Download

Especially when copying whole sites, it often happens that Wget ends up downloading stuff that you do not want. The usual procedure then is to interrupt Wget, specify some additional options to reject some stuff and restart again. For example, you found some site and started to download it:

wget --recursive --no-parent http://www.interesting.site/~stuff/

But soon you notice that there are loads of redundant PDF documents which only reproduce the information you are just obtaining in HTML. Therefor you interrupt and start again with more options:

wget --recursive --no-parent --no-clobber --reject=pdf http://www.interesting.site/stuff/

To your further annoyance it turns out that the directory /stuff/crap entirely hold things you are not interested in. So therefor you restart again:

wget --recursive --no-parent --no-clobber --reject=pdf --exclude-directories=/stuff/crap/ http://www.interesting.site/stuff/

And so on. As you can see, it can take quite some effort before you find proper options for a certain site.

So how can the above procedure be performed with Wget.rexx? Apparently, there is no history function like in the CLI, where you can switch back to the previous call and add additional options to it.

However, you can make Wget.rexx to store the options entered in the requester when Ask was specified in a ARexx-Clip. This will be preserved and can be read again and used as default value in the requester. To achieve that, enable Continue.

Now that sounds confusing, but let's see how it works in practice, using an extended version of the Copy site... macro from before:

Name	Macro
Copy site..	wget.rexx Clip=wget-site Ask Further --recursive --no-parent
Continue copy site...	wget.rexx Clip=wget-site Ask Further Continue --recursive --no-parent --no-clobber

The macro Copy site... will always pop up an empty requester, where you can specify additional options like --reject. The options you enter there will be stored in an ARexx clip called "wget-site". It does not really matter how you call this clip, the only important thing is that you use the same name for the macro that reads the clip.

And this is exactly what Continue copy site... does: because of Continue it does not clear the clip. Instead, the clip is only read and its value used as default text in the string requester. The additional parameter --no-clobber just tells Wget not to download again the files you already got.

So how does the above example session look with Wget.rexx?

First, you browse to http://www.interesting.site/stuff/ and select Copy site. Soon you have to interrupt it, select Copy site... and enter --reject=pdf into the upcoming requester. Until now, there is nothing you could not have already done with the old macros.

But when it turns out that the --reject was not enough, you only have to select Continue copy site..., and the --reject=pdf is already available in the requester. You just have to extend the text to --reject=pdf --exclude-directories=/stuff/crap/ and can go on.

And if later on some MPEG animations show up, you should know what to do by reselecting Continue copy site... again...

Specify a URI

You can also specify a URI for Wget when invoking the script. In this case, the document currently viewed in the browser will be ignored.

Specifying more than one URI or a protocol different to http:// will result in an error message. This is a difference to Wget, which allows multiple URIs and also supports ftp://.

URIs have to be specified in there full form, which means you for example can not use www.cucug.org. Instead, the complete http://www.cucug.org/ is required.

Among other things, this can be useful for creating a macro to update a local copy of an earlier downloaded site. For example with IBrowse you could use:

Name	Macro
Update biscuits	wget.rexx --timestamping --recursive --no-parent --exclude-directories=/~hmhb/fallnet http://www.btinternet.com/~hmhb/hmhb.htm

This basically acts like if http://www.btinternet.com/~hmhb/hmhb.htm would have been viewed in the browser, and you would have selected Copy site. Because of --timestamping, only new data are actually copied. Because of the --exclude-directories=/~hmhb/fallnet some stuff that was considered to be unwanted at a former download is skipped.

Putting this into the macro menu spares you to remember which options you used two months ago.

Outsourcing Macros

If you have many macros like Update biscuits from before, it does not make sense any more to put them into the macro menu of the browser. Such macros are not called very often and mostly serve the purpose of "remembering" options you used to copy a site.

Fortunately there are many different places where you can put such macros.

A straight forward approach would be to put them in shell scripts. For example, s:wget-biscuits could hold the following line:

rx wget.rexx --timestamping --recursive --no-parent --exclude-directories=/~hmhb/fallnet http://www.btinternet.com/~hmhb/hmhb.htm

If you set the script protection bit by means of

protect s:wget-biscuits ADD s

you can simply open a CLI and type

wget-biscuits

to invoke the macro.

But this merely is useful to outline the idea. Who wants to deal with CLI, if it can be avoided? If you have some experience, you can easily store such a call in a button for DirectoryOpus or ToolManager (available from util/wb/ToolManager#?.lha. That way, you can even create dock hierarchies and sub-menus for your download macros. Refer to the manuals of these applications for more details.

Generating an Index for Web:

If you often download web sites, you will find it inconvenient to clutter your bookmarks. After all, many of them will only be for reference purpose, and not be visited regularly.

The included rexx script make_web_index.rexx allows a different approach: it scans all directories in Web: and creates a document in Web:index.html that links all these directories. Probably you want to include this particular document in your bookmarks or hot-link buttons.

One problem with this is that sometimes the name of the directory/site doesn't tell much about its contents. In that case, you can add a comment to the directory. From the Workbench, select the directory, choose "Icons/Information..." and enter whatever you find appropriate in the "Comment" field. In CLI, use the FileNote command, for example:

filenote web:www.westnet.com Weddoes

The script automatically scans all directory levels for documents suitable as welcome page (usually named index.html, welcome.html or something similar). Consequently, if it can't find a www.westnet.com/index.html, but a directory web:www.westnet.com/weddoes, it will continue there. If it cannot find any welcome document, it will pick the first HTML document.

In most cases this is what you want. If not, you can manually specify which document to choose as welcome page in a file named welcome.web. For example, start your favourite editor like

ed web:www.westnet.com/welcome.web

and enter the single line

weddoes/faq.html

Start the script, open Web:index.html in your browser, click on www.westnet.com and - voilà! - you can see the FAQ instead of the welcome page.

The same procedure applies if the script cannot find a suitable welcome page, e.g. because you downloaded only a single image from a site. In that case, just specify the location of the image in welcome.web.

Troubleshooting

Here are some common problems and some hints how to solve them or where to get further information from.

Problems with `Wget.rexx`

The browser displays the local copy as plain text. I can actually see the HTML code (yuk)!

This happens when the downloaded file is a HTML document, but doesn't have a filename suffix recognized by the browser.

Probably you downloaded some dynamic web page ending .cgi, .asp, cfm, .php, or something similar meaningless. Probably the whole URI looks weird and contains heaps of ampersands (&).

The reason why a page ending e.g. in .cfm can look good on the web, but turns to ugly HTML code in the local copy is that on the web, the web server sending the document told the browser "This is actually a HTML document, despite ending in .php". But, when the browser just loads it from the disk, nobody tells it what it is.

Fortunately, there is a solution: you can tell the browser by specifying the suffix as HTML document in the MIME settings. Your browser should have a menu like "Settings/MIME" or "Settings/External viewer". Probably, there already is an entry for "html htm shtml". Just add the suffix of the current document to this list. When you try to reload it, the browser should now recognize it as HTML document, and display it accordingly.

My browser is not supported!

Supporting new browsers should be easy, assuming that its ARexx port is powerful enough. In detail, the following data are needed:

Name of default ARexx port,
ARexx command to retrieve the name of the URI currently viewing
ARexx command to load a new URI

You should be able to find this information in the manual of your browser. If you submit it to me your browser will probably be supported within the next update of WgetRexx.

Wget.rexx does not find Wget, although it is in the Workbench search path!

This problem can occur if you started your browser from ToolManager or similar utilities. The original search path is not passed to the browser. There are several ways to work around this:

Sometimes the tool which starts the browser allows to specify the search path manually. This has the least impact on your system stability.
The easiest solution is to copy Wget to c:, which is always in the search path. This is not very elegant, but even works if the first suggestion is not applicable.
The most hackish approach is to change the line wget_command = '...' at the beginning of the script. If you don't care about elegance and future update problems, but need an ego rush and want to prove to your friends how mighty you are, this is the way to go.

When interrupting Wget.rexx by pressing Control-C, my browser quits!

Sad but true. Contact your technical support and try to convince them that this sucks.

The macro Copy page with pictures does not seem to work with frames!

Of course not, because the technical implementation of the whole frame concept sucks.

As a workaround, you can view every frame in its own window and apply the macro to it. At the end, you will have all frames the pages is consisting of and can finally also copy the frame page itself.

Now what is case-sensitive and what is not?

WgetRexx uses several different mechanisms, and unfortunately they differ concerning case-sensitivity:

Names of ARexx ports are case sensitive (like IBROWSE). Usually they are all-uppercase.
Names of ARexx clips are case sensitive (passed as value to the Clip option)
CLI options for Wget.rexx are case insensitive (like Ask), but its values not necessarily (like for Port, where the value specifies an ARexx port).
CLI options for Wget are case sensitive (like --recursive).

For URI values, it becomes even more confusing, as it depend on the server. Nevertheless filenames are case insensitive as soon as they are on your hard drive.

Problems with `Wget`

I can't make Wget work at all, not even from the command line!

Bad luck. Not my problem.

Wget basically works, but I don't know how to ... with it.

Refer to the Wget manual for that. There is a command line option for nearly everything, so usually you only have to search long enough.

Wget starts and connects as intended, but when it comes to writing the downloaded data to disk, it fails!

Due to problems in ixemul.library Wget can not write to partitions handled by XFH or similar compressing file systems. Use a directory on a normal partition as destination.

Wget works nice, but it also download loads of bullshit I don't want to have on my hard drive!

There are several command line options to prevent Wget from downloading certain data, with the most important ones being:

--reject and --accept resp. -R and -A
--exclude-directories resp. -X
--exclude-domains

Refer to the manual how to use them and pass them to the requester of Copy Site... macro.

Wget downloads a site, but the links on my hard disk still all refer to the original in the WWW!

There are two possible reasons for such problems:

The web author has decided to use global links (which always have the full http:// stuff) instead of relative links which only specify the filename. See --convert-links to find out what to do about that.
The web author has decided to use the base tag, which specifies a kind-of prefix being applied to all local URIs. In practice, this means that all local URIs automatically expand to a global one pointing to the original site in the web. Unfortunately, Wget does not offer any options to convert such sites.

Wget refuses to download everything from a site!

First you should check, if the options you passed to Wget don't make it to reject some stuff you possibly want to have. For example, with --no-parent "global" images like logos accessed by several sites on the same server are ignored. However, not specifying --no-parent when copying sites usually lets you end up in the deep mud of ways to reject directories, domains and file patterns so that it is usually not worth the trouble.

If more missing than just a couple of images, this can have several other reasons:

The web author was a complete idiot, wrote some severely corrupted HTML code, never validated it and thinks everything is ok just because his favourite browser can display it.
The web author was a complete idiot and does not use normal HTML links for navigation but for example JavaScript. (Of course pages using JavaScript only for some stupid effects are no problem, except for the annoyance.)
The web author was a complete idiot and thinks you need a certain browser to download his site.
The web author was a complete idiot and did something else very stupid not mentioned here.
The web author decided that he does not want you to download his site and has specified so in his robots.txt. You can detect such a case by seeing that Wget finished immediately after downloading robots.txt. Look at the manual to learn more about robots.txt, and how Wget deals with it. But usually people have good reasons preventing you from downloading certain data.
The recursion level of Wget was not high enough. This is very rare, as the internal default is usually enough. See --level to learn more about that.

In most cases you can assume that the web author was a complete idiot. As a matter of fact, most of them are. Unfortunately, there is not much you can do about that.

Wget always requests me to "insert volume usr: in any drive"!

As Wget comes from the Unix-world, the AmigaOS binary uses the ixemul.library. It expects a Unix-like directory tree, and looks for an optional configuration file in usr:etc/.wgetrc.

It won't hurt if you assign usr: to wherever (e.g. t:) and do not provide this configuration file at all, as internal defaults will then be used. If you have an AssignWedge-alike installed, you can also deny this assign without facing any negatives consequences for Wget.

However, you should remember that other Unix ports you might install in future could require usr: to point to some reasonable location. For example, when installing Geek Gadgets, it adds its own assign for usr: to your s:user-startup, and you should not overwrite this later on.

Problems with AWeb

AWeb does not allow me to modify the ARexx menu!

The full version does.

AWeb does not allow me to enter such long macros as Copy site with pictures!

It seems that at least for AWeb 3.1, the maximum length of the input fields has an unreasonable small value. (CLI accepts commands upto 512 characters).

This problem has been reported to the technical support and might be fixed in a future version. Until then, you can use the following workaround:

Do not enter the macro text in the settings requester, but instead, write it to a script file and store for example in s:wget-page-with-images. This script would consist of only a single line saying:

wget.rexx --recursive --level=1 --accept=png,jpg,jpeg,gif --convert-links

In the macro menu, you now only add:

Name	Macro
Copy page with images	s:wget-page-with-images >CON://640/200/Wget/CLOSE/WAIT

Again, this is only a workaround and not the way the things should be.

Updates and Support

New versions of WgetRexx will be uploaded to aminet:comm/www/WgetRexx.lha. Optionally you can also obtain them from the WgetRexx Support Page at http://www.giga.or.at/~agi/wgetrexx/.

If you found a bug or something does not work as described, you can reach the author via e-mail at this address: Thomas Aglassinger <agi@sbox.tu-graz.ac.at>.

But before you contact me, look if your problem has not already been covered in the chapter about troubleshooting.

When reporting problems, please include the name of the WWW browser and version number of Wget.rexx you are using. You can find this out by taking a look at the source code or by typing version wget.rexx into CLI.

And please don't bother me with problems like to how to use a certain command line option of Wget to achieve a certain result on a certain site. This program has its own manual.

History

Version 2.4, 20-Jun-2000

Fixed bug: URI's containing a tilde (~) caused endless loop.

Version 2.3, 18-Jun-2000

Fixed bug: The URI to download is now quoted before passed to wget via CLI. This should remove trouble with dynamic web pages generated by CGI scripts or stuff like PHP. Without techno-fuzz: this should remove problems with forms, web-based discussion groups, or more general: URIs that contain heaps of ampersands (&). Probably you have to extend you MIME settings with these suffixes, otherwise you might only see the actual HTML code in the browser.
Included script to automatically generate an index for Web:.

Version 2.2, 10-Nov-1999

Fixed bug (again): Pressing "Cancel" in the Ask requester still did not abort the script. The problem seemed to be with RexxReqTools, which didn't set my custom result variable ok. The manual of RexxReqTools is a bit unclear in that one, but you have to specify the result variable name as a string("ok" instead of just ok. Anyway, now the script uses the internal default variable rtresult.
Fixed a bug in the "First Steps" section of the manual (used --prefix-directory instead of --directory-prefix)
Added CLI option Screen to allow redirecting the Ask requester to the browser screen.
Added a couple of new questions and hints to the troubleshooting section, mostly addressing problems people have reported since the previous release.
Added possibility to change some of the more obscure internal variables of the script at the beginning. Nevertheless, usually there should not be any need to do so.
Slightly reworded some of the error messages.

Version 2.1, 13-May-1998

Added support to automatically convert URIs beginning with file://localhost/Web: to one with http:// if the script is activated while browsing in your local Web:.
Added --convert-links to the Copy site macros.
Changed title of requesters from "Wget Request" to "Wget.rexx Request".
Fixed bug: Pressing "Cancel" in the requester for Wget options did not abort the script (resulting from an unexpected behavior of rexxreqtools.library which seems to set the resultname of rtGetString() only if "Ok" is selected.)
Fixed bug: When opening libraries failed, the script viewed an error message (as it is supposed to), but did not exit afterwards. The result was a run time error when accessing a library function later.
Fixed bug: with Verbose, download information was viewed twice.
Cleaned up code a little.

Version 2.0, 8-May-1998

Added support for non-wget specific CLI options. As this is done via ReadArgs() of rexxdossupport.library, this library is now also required.
Added several CLI options.
Added more details how to use it with AWeb, especially how to get Wget to output to a console window.
Added handler to catch Control-C and make a somewhat more pleasant output then the standard ARexx handler. (This however does not fix the "browser quits on Control-C" problem)
Added handler to catch uninitialized variables.
Changed the default output of the script so that only the command call to Wget is viewed. Use Verbose for more details.
Changed from one script for every browser to a single script that figures out the browser by itself.
Changed installation notes and recommended to store the script in rexx: instead of the browser directory.
Changed the script wget-ask.#? to CLI option Ask.
Fixed a couple of minor bugs concerning uninitialized variables and a missing CALL statement.
Fixed some quirks in the manual.
Clarified that rexxreqtools.library is already included in ReqToolsUsr.lha and there is no need to download the separate archive.

Version 1.1, 30-Apr-1998

First public release.

Version 1.0, 23-Mar-1998

Initial internal release.

Privacy statement | Legal information | Contact