http://mydomain.com/robots.txt
If you cannot access your server's root location you will not be able to use a
robots.txt file to exclude pages from your index.
The robots.txt is a TEXT file (not HTML!) which has a section for each robot to be
controlled. Each section has a user-agent line which names the robot to be controlled and
has a list of "disallows" and "allows". Each disallow will prevent any address that starts
with the disallowed string from being accessed. Similarly, each allow will permit any
address that starts with the allowed string from being accessed. The (dis)allows are
scanned in order, with the last match encountered determining whether an address is
allowed to be used or not. If there are no matches at all then the address will be used.
Here's an example:
user-agent: FreeFind
disallow: /mysite/test/
disallow: /mysite/cgi-bin/post.cgi?action=reply
disallow: /a
In this example the following addresses would be ignored by the spider:
http://adomain.com/mysite/test/index.html
http://adomain.com/mysite/cgi-bin/post.cgi?action=reply&id=1
http://adomain.com/mysite/cgi-bin/post.cgi?action=replytome
http://adomain.com/abc.html
and the following ones would be allowed:
http://adomain.com/mysite/test.html
http://adomain.com/mysite/cgi-bin/post.cgi?action=edit
http://adomain.com/mysite/cgi-bin/post.cgi
http://adomain.com/bbc.html
It is also possible to use an "allow" in addition to disallows. For example:
user-agent: FreeFind
disallow: /cgi-bin/
allow: /cgi-bin/Ultimate.cgi
allow: /cgi-bin/forumdisplay.cgi
This robots.txt file prevents the spider from accessing every cgi-bin address from being
accessed except Ultimate.cgi and forumdisplay.cgi.
Using allows can often simplify your robots.txt file.
Here's another example which shows a robots.txt with two sections in it. One for "all" robots, and
one for the FreeFind spider:
user-agent: *
disallow: /cgi-bin/
user-agent: FreeFind
disallow:
In this example all robots except the FreeFind spider will be prevented from accessing files in the
cgi-bin directory. FreeFind will be able to access all files (a disallow with nothing after it means
"allow everything").