Loading ...
Sorry, an error occurred while loading the content.

Re: Webcrawlers and Yahoo Groups search

Expand Messages
  • smutterbuggler
    ... http://groups.yahoo.com/group/webalizer/messagesearch?query=web%20crawlers Hmm..... O.K. the problem seems to have been with the subsequent search of the
    Message 1 of 9 , Nov 8, 2004
    • 0 Attachment
      --- In webalizer@yahoogroups.com, Enric Naval <enventa2000@y...> wrote:
      > About the yahoo groups search problem. I couldn't
      > reproduce your problem. I got the right results when I
      > tried it out. I followed these steps (so you can
      > compare them with yours): I went to groups.yahoo.com,
      > then clicked in the "webalizer" group, then clicked in
      > the search box just above the calendar and typed "web
      > crawlers" (without the double quotes), and pressed
      > enter. It sent to this page. You can go there and
      > press "next" for more results.
      >
      >
      http://groups.yahoo.com/group/webalizer/messagesearch?query=web%20crawlers

      Hmm..... O.K. the problem seems to have been with the subsequent
      search of the page returned :-(

      There is a 'webcrawler' entry in webalizer.conf, so I get a hit when
      anyone includes a listing of the webalizer.conf with a query.

      For some reason I couldn't find the string with the 'search' function
      of my browser.

      My bad, no doubt.

      Visitor issues in another response :-)
    • smutterbuggler
      ... ... What are the risks (if any) of publishing the usage pages on the web server for outside viewing? To turn this on for a day (say) would
      Message 2 of 9 , Nov 8, 2004
      • 0 Attachment
        --- In webalizer@yahoogroups.com, Enric Naval <enventa2000@y...> wrote:
        <snip>
        > About the strange visits: Hum, so, if I have
        > understood correctly, your site is empty, yet you are
        > receiving many visits from weird places?
        >
        > There are two posibilities:
        >
        > 1- automated programs scanning for vulnerabilities,
        > and not finding one. They search for things like
        > "vti_bin", "vti_inf" or "command.exe". This wouldn't
        > explain the climbing in visits.
        >
        > 2- one of those automated programs has already found a
        > vulnerability, and more and more people is using your
        > server as a proxy or something similar as your IP is
        > being propagated in the underground proxy lists....
        >
        >
        > Could you copy & paste the top URL list in a message?
        > A quick look would allow to know wether the visitors
        > are malicious or not. The Top KB list could also be
        > useful.
        >
        > Also: Kostiki is a russian word. I don't know its
        > meaning (I think it is a made up word). When I
        > searched for it in google I found no results related
        > to web crawlers, instead I found a few results related
        > to the Counter-Strike videogame and a few other
        > results in russian, some of them lists of members.
        > Someone going by the nickname "Kostiki" seems to have
        > made his own client, or maybe he has changed the
        > User-Agent line of an existing client.
        >
        > Also: The Top IP visiting you is based in Singapore.
        > Is this normal?
        >
        > # whois 203.118.42.188
        > [Preguntando whois.apnic.net]
        > [whois.apnic.net]
        > [...]
        > netname: STARHUBINTERNET-SG
        > descr: 19 Taiseng Drive
        > descr: SINGAPORE 535222
        > [...]
        >
        >
        <snip>

        What are the risks (if any) of publishing the 'usage' pages on the web
        server for outside viewing?

        To turn this on for a day (say) would save a lot of cutting and
        pasting etc.
      • Enric Naval
        ... A malicious user can look up the list of Top Users, to learn about the usernames that are used to access the protected parts. The Top URL list could list
        Message 3 of 9 , Nov 8, 2004
        • 0 Attachment
          --- smutterbuggler <wibble@...> wrote:

          >
          > --- In webalizer@yahoogroups.com, Enric Naval
          > <enventa2000@y...> wrote:
          > <snip>
          > > About the strange visits: Hum, so, if I have
          > > understood correctly, your site is empty, yet you
          > are
          > > receiving many visits from weird places?
          > >
          > > There are two posibilities:
          > >
          > > 1- automated programs scanning for
          > vulnerabilities,
          > > and not finding one. They search for things like
          > > "vti_bin", "vti_inf" or "command.exe". This
          > wouldn't
          > > explain the climbing in visits.
          > >
          > > 2- one of those automated programs has already
          > found a
          > > vulnerability, and more and more people is using
          > your
          > > server as a proxy or something similar as your IP
          > is
          > > being propagated in the underground proxy
          > lists....
          > >
          > >
          > > Could you copy & paste the top URL list in a
          > message?
          > > A quick look would allow to know wether the
          > visitors
          > > are malicious or not. The Top KB list could also
          > be
          > > useful.
          > >
          > > Also: Kostiki is a russian word. I don't know its
          > > meaning (I think it is a made up word). When I
          > > searched for it in google I found no results
          > related
          > > to web crawlers, instead I found a few results
          > related
          > > to the Counter-Strike videogame and a few other
          > > results in russian, some of them lists of members.
          > > Someone going by the nickname "Kostiki" seems to
          > have
          > > made his own client, or maybe he has changed the
          > > User-Agent line of an existing client.
          > >
          > > Also: The Top IP visiting you is based in
          > Singapore.
          > > Is this normal?
          > >
          > > # whois 203.118.42.188
          > > [Preguntando whois.apnic.net]
          > > [whois.apnic.net]
          > > [...]
          > > netname: STARHUBINTERNET-SG
          > > descr: 19 Taiseng Drive
          > > descr: SINGAPORE 535222
          > > [...]
          > >
          > >
          > <snip>
          >
          > What are the risks (if any) of publishing the
          > 'usage' pages on the web
          > server for outside viewing?

          A malicious user can look up the list of Top Users, to
          learn about the usernames that are used to access the
          protected parts.

          The Top URL list could list private pages that you
          access for administration.

          You may get log-spammed, where one visitor to your
          page fakes its referral so that it holds an URL to a
          commercial page. When google parses the usage files,
          that commercial page's PageRank increases because
          Google believes that your site is linking to it.

          A good security measure is disabling the "All URLs",
          "All Users", etc. lists.

          For a temporal publishing, the safest is copying just
          one HTML page (a 1 month stats page), placing it in an
          empty folder, then aliasing that folder as "usage".
          You can hand-delete the confidential information if
          necesary, as it is only one page.

          Alias /usage "/var/www/html/usage"
          Alias /usage/ "/var/www/html/usage/"


          > To turn this on for a day (say) would save a lot of
          > cutting and
          > pasting etc.
          >

          I thougt it was actually easy, select the files in
          your browser, then pasting them here. It's dirty, and
          the results look ugly, but it works.


          =====
          Enric Naval
          Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
          GRIHO webalizer.conf
          http://griho.udl.es/webalizer/webalizer.conf.txt



          __________________________________
          Do you Yahoo!?
          Check out the new Yahoo! Front Page.
          www.yahoo.com
        • waldo kitty
          ... what are the risks? logfile spamming for one thing... that s where folk spam stuff not to your site but to your log files for the search engine spiders to
          Message 4 of 9 , Nov 8, 2004
          • 0 Attachment
            smutterbuggler wrote:
            >
            > <snip>
            >
            > What are the risks (if any) of publishing the 'usage' pages on the web
            > server for outside viewing?

            what are the risks? logfile spamming for one thing... that's where folk spam stuff not to your site but to your log
            files for the search engine spiders to find... if they can get enough hits to their site, they will climb in the search
            engine rankings... the higher their rankings, the more money they can make...

            > To turn this on for a day (say) would save a lot of cutting and
            > pasting etc.

            for a day or so? i couldn't say... i wouldn't unless i had a good idea when the spiders would be around... as an
            example, google is a regular on my site but M$'s new search engine spider has been a real nusiance since going online as
            it walks my site most every day...

            --
            _\/
            (@@) Waldo Kitty, Waldo's Place USA
            __ooO_( )_Ooo_____________________ telnet://bbs.wpusa.dynip.com
            _|_____|_____|_____|_____|_____|_____ http://www.wpusa.dynip.com
            ____|_____|_____|_____|_____|_____|_____ ftp://ftp.wpusa.dynip.com
            _|_Eat_SPAM_to_email_me!_YUM!__|_____|_____ wkitty42 -at- alltel.net
          • Enric Naval
            ... You can create a robots.txt file in your root folder, to prevent robots from crawling certains pages. Most robots obey this standard, including google and
            Message 5 of 9 , Nov 9, 2004
            • 0 Attachment
              > > To turn this on for a day (say) would save a lot
              > of cutting and
              > > pasting etc.
              >
              > for a day or so? i couldn't say... i wouldn't unless
              > i had a good idea when the spiders would be
              > around... as an
              > example, google is a regular on my site but M$'s new
              > search engine spider has been a real nusiance since
              > going online as
              > it walks my site most every day...

              You can create a robots.txt file in your root folder,
              to prevent robots from crawling certains pages. Most
              robots obey this standard, including google and msn.
              Some sbots will instead use this as an index of what
              pages you don't want them to see, and crawl them in
              purpose, but there are very litle of them. For just a
              day, there is very little risk.

              User-agent: *
              Disallow: /usage


              =====
              Enric Naval
              Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
              GRIHO webalizer.conf
              http://griho.udl.es/webalizer/webalizer.conf.txt



              __________________________________
              Do you Yahoo!?
              Check out the new Yahoo! Front Page.
              www.yahoo.com
            • Enric Naval
              ... Silly of me... There is a very safe way to do it. You can protect the directory with a password, then publish the password in this list. This will stop
              Message 6 of 9 , Nov 9, 2004
              • 0 Attachment
                > > To turn this on for a day (say) would save a lot
                > of cutting and
                > > pasting etc.
                >
                > for a day or so? i couldn't say... i wouldn't unless
                > i had a good idea when the spiders would be
                > around... as an
                > example, google is a regular on my site but M$'s new
                > search engine spider has been a real nusiance since
                > going online as
                > it walks my site most every day...

                Silly of me... There is a very safe way to do it. You
                can protect the directory with a password, then
                publish the password in this list. This will stop
                crawlers, bots, etc.

                You can copy&paste the text below in httpd.conf,
                inside the appropiate "Directory" container, or in a
                .htaccess file in the directory you want to protect.
                If you use a .htaccess file then you need to have an
                AllowOverride line in the apropiate "Directory"
                container in httpd.conf, or apache will refuse to obey
                the .htaccess instructions. If you didn't add any
                directory, that would be between these two lines (they
                are very near to each other):
                <Directory />
                </Directory>


                this is the line to add to httpd.conf:

                AllowOverride AuthConfig Limit


                TEXT TO COPY&PASTE
                #*******************************


                AuthType Basic
                AuthName "Usage page"
                AuthUserFile /tmp/.htpasswd_usage
                require user LOGIN

                # To generate a new password execute:
                # htapasswd -c /tmp/.htpasswd_usage LOGIN
                # the type the password you want to use.


                #*******************************

                =====
                Enric Naval
                Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
                GRIHO webalizer.conf
                http://griho.udl.es/webalizer/webalizer.conf.txt

                __________________________________________________
                Do You Yahoo!?
                Tired of spam? Yahoo! Mail has the best spam protection around
                http://mail.yahoo.com
              Your message has been successfully submitted and would be delivered to recipients shortly.