Loading ...
Sorry, an error occurred while loading the content.

robots.txt - what does it do?

Expand Messages
  • frank visser
    hi tilman, while experimenting with Fast Link Checker (WebTweakTools.Net) i noticed this link was checked by default: /robots.txt there isn t any /robots.txt
    Message 1 of 2 , May 29, 2005
    • 0 Attachment
      hi tilman,

      while experimenting with Fast Link Checker (WebTweakTools.Net) i
      noticed this link was checked by default: /robots.txt

      there isn't any /robots.txt link on any intel page, why would that
      spider detect it?

      there's also an option "ignore robots.txt file" in that program.
      what would be the purpose of that?

      is this spider related or search engine or both?

      this is the content of that particular robots.txt file:


      # robots.txt exclusion for www.intel.com
      #
      # for all agents, keep them out of the /cgi directory

      User-agent: *
      Disallow: /cgi
      Disallow: /iaweb/
      Disallow: /cpc/vision/
      Disallow: /intel/june297/
      Disallow: /cpc/eps/
      Disallow: /design/june297/
      Disallow: /cpc/archive/
      Disallow: /cpc/dia/
      Disallow: /cpc/ecs/
      Disallow: /cpc/fcs/
      Disallow: /cpc/gif/
      Disallow: /cpc/OptContent/
      Disallow: /cpc/pix/
      Disallow: /cpc/sound/
      Disallow: /cpc/feature/

      does not make any sense to me ;-)

      frank
    • Tilman Hausherr
      http://www.searchengineworld.com/robots/robots_tutorial.htm should answer all. Xenu does not bother with that file, although it is often asked. Tilman
      Message 2 of 2 , May 29, 2005
      • 0 Attachment
        http://www.searchengineworld.com/robots/robots_tutorial.htm should
        answer all. Xenu does not bother with that file, although it is often
        asked.

        Tilman

        On Sun, 29 May 2005 18:05:01 -0000, frank visser wrote:

        >hi tilman,
        >
        >while experimenting with Fast Link Checker (WebTweakTools.Net) i
        >noticed this link was checked by default: /robots.txt
        >
        >there isn't any /robots.txt link on any intel page, why would that
        >spider detect it?
        >
        >there's also an option "ignore robots.txt file" in that program.
        >what would be the purpose of that?
        >
        >is this spider related or search engine or both?
        >
        >this is the content of that particular robots.txt file:
        >
        >
        ># robots.txt exclusion for www.intel.com
        >#
        ># for all agents, keep them out of the /cgi directory
        >
        >User-agent: *
        >Disallow: /cgi
        >Disallow: /iaweb/
        >Disallow: /cpc/vision/
        >Disallow: /intel/june297/
        >Disallow: /cpc/eps/
        >Disallow: /design/june297/
        >Disallow: /cpc/archive/
        >Disallow: /cpc/dia/
        >Disallow: /cpc/ecs/
        >Disallow: /cpc/fcs/
        >Disallow: /cpc/gif/
        >Disallow: /cpc/OptContent/
        >Disallow: /cpc/pix/
        >Disallow: /cpc/sound/
        >Disallow: /cpc/feature/
        >
        >does not make any sense to me ;-)
        >
        >frank
        >
        >
        >
        >
        >
        >
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.