Loading ...
Sorry, an error occurred while loading the content.

Re: [webalizer] Re: Webcrawlers and Yahoo Groups search

Expand Messages
  • Enric Naval
    ... A malicious user can look up the list of Top Users, to learn about the usernames that are used to access the protected parts. The Top URL list could list
    Message 1 of 9 , Nov 8, 2004
    • 0 Attachment
      --- smutterbuggler <wibble@...> wrote:

      >
      > --- In webalizer@yahoogroups.com, Enric Naval
      > <enventa2000@y...> wrote:
      > <snip>
      > > About the strange visits: Hum, so, if I have
      > > understood correctly, your site is empty, yet you
      > are
      > > receiving many visits from weird places?
      > >
      > > There are two posibilities:
      > >
      > > 1- automated programs scanning for
      > vulnerabilities,
      > > and not finding one. They search for things like
      > > "vti_bin", "vti_inf" or "command.exe". This
      > wouldn't
      > > explain the climbing in visits.
      > >
      > > 2- one of those automated programs has already
      > found a
      > > vulnerability, and more and more people is using
      > your
      > > server as a proxy or something similar as your IP
      > is
      > > being propagated in the underground proxy
      > lists....
      > >
      > >
      > > Could you copy & paste the top URL list in a
      > message?
      > > A quick look would allow to know wether the
      > visitors
      > > are malicious or not. The Top KB list could also
      > be
      > > useful.
      > >
      > > Also: Kostiki is a russian word. I don't know its
      > > meaning (I think it is a made up word). When I
      > > searched for it in google I found no results
      > related
      > > to web crawlers, instead I found a few results
      > related
      > > to the Counter-Strike videogame and a few other
      > > results in russian, some of them lists of members.
      > > Someone going by the nickname "Kostiki" seems to
      > have
      > > made his own client, or maybe he has changed the
      > > User-Agent line of an existing client.
      > >
      > > Also: The Top IP visiting you is based in
      > Singapore.
      > > Is this normal?
      > >
      > > # whois 203.118.42.188
      > > [Preguntando whois.apnic.net]
      > > [whois.apnic.net]
      > > [...]
      > > netname: STARHUBINTERNET-SG
      > > descr: 19 Taiseng Drive
      > > descr: SINGAPORE 535222
      > > [...]
      > >
      > >
      > <snip>
      >
      > What are the risks (if any) of publishing the
      > 'usage' pages on the web
      > server for outside viewing?

      A malicious user can look up the list of Top Users, to
      learn about the usernames that are used to access the
      protected parts.

      The Top URL list could list private pages that you
      access for administration.

      You may get log-spammed, where one visitor to your
      page fakes its referral so that it holds an URL to a
      commercial page. When google parses the usage files,
      that commercial page's PageRank increases because
      Google believes that your site is linking to it.

      A good security measure is disabling the "All URLs",
      "All Users", etc. lists.

      For a temporal publishing, the safest is copying just
      one HTML page (a 1 month stats page), placing it in an
      empty folder, then aliasing that folder as "usage".
      You can hand-delete the confidential information if
      necesary, as it is only one page.

      Alias /usage "/var/www/html/usage"
      Alias /usage/ "/var/www/html/usage/"


      > To turn this on for a day (say) would save a lot of
      > cutting and
      > pasting etc.
      >

      I thougt it was actually easy, select the files in
      your browser, then pasting them here. It's dirty, and
      the results look ugly, but it works.


      =====
      Enric Naval
      Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
      GRIHO webalizer.conf
      http://griho.udl.es/webalizer/webalizer.conf.txt



      __________________________________
      Do you Yahoo!?
      Check out the new Yahoo! Front Page.
      www.yahoo.com
    • waldo kitty
      ... what are the risks? logfile spamming for one thing... that s where folk spam stuff not to your site but to your log files for the search engine spiders to
      Message 2 of 9 , Nov 8, 2004
      • 0 Attachment
        smutterbuggler wrote:
        >
        > <snip>
        >
        > What are the risks (if any) of publishing the 'usage' pages on the web
        > server for outside viewing?

        what are the risks? logfile spamming for one thing... that's where folk spam stuff not to your site but to your log
        files for the search engine spiders to find... if they can get enough hits to their site, they will climb in the search
        engine rankings... the higher their rankings, the more money they can make...

        > To turn this on for a day (say) would save a lot of cutting and
        > pasting etc.

        for a day or so? i couldn't say... i wouldn't unless i had a good idea when the spiders would be around... as an
        example, google is a regular on my site but M$'s new search engine spider has been a real nusiance since going online as
        it walks my site most every day...

        --
        _\/
        (@@) Waldo Kitty, Waldo's Place USA
        __ooO_( )_Ooo_____________________ telnet://bbs.wpusa.dynip.com
        _|_____|_____|_____|_____|_____|_____ http://www.wpusa.dynip.com
        ____|_____|_____|_____|_____|_____|_____ ftp://ftp.wpusa.dynip.com
        _|_Eat_SPAM_to_email_me!_YUM!__|_____|_____ wkitty42 -at- alltel.net
      • Enric Naval
        ... You can create a robots.txt file in your root folder, to prevent robots from crawling certains pages. Most robots obey this standard, including google and
        Message 3 of 9 , Nov 9, 2004
        • 0 Attachment
          > > To turn this on for a day (say) would save a lot
          > of cutting and
          > > pasting etc.
          >
          > for a day or so? i couldn't say... i wouldn't unless
          > i had a good idea when the spiders would be
          > around... as an
          > example, google is a regular on my site but M$'s new
          > search engine spider has been a real nusiance since
          > going online as
          > it walks my site most every day...

          You can create a robots.txt file in your root folder,
          to prevent robots from crawling certains pages. Most
          robots obey this standard, including google and msn.
          Some sbots will instead use this as an index of what
          pages you don't want them to see, and crawl them in
          purpose, but there are very litle of them. For just a
          day, there is very little risk.

          User-agent: *
          Disallow: /usage


          =====
          Enric Naval
          Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
          GRIHO webalizer.conf
          http://griho.udl.es/webalizer/webalizer.conf.txt



          __________________________________
          Do you Yahoo!?
          Check out the new Yahoo! Front Page.
          www.yahoo.com
        • Enric Naval
          ... Silly of me... There is a very safe way to do it. You can protect the directory with a password, then publish the password in this list. This will stop
          Message 4 of 9 , Nov 9, 2004
          • 0 Attachment
            > > To turn this on for a day (say) would save a lot
            > of cutting and
            > > pasting etc.
            >
            > for a day or so? i couldn't say... i wouldn't unless
            > i had a good idea when the spiders would be
            > around... as an
            > example, google is a regular on my site but M$'s new
            > search engine spider has been a real nusiance since
            > going online as
            > it walks my site most every day...

            Silly of me... There is a very safe way to do it. You
            can protect the directory with a password, then
            publish the password in this list. This will stop
            crawlers, bots, etc.

            You can copy&paste the text below in httpd.conf,
            inside the appropiate "Directory" container, or in a
            .htaccess file in the directory you want to protect.
            If you use a .htaccess file then you need to have an
            AllowOverride line in the apropiate "Directory"
            container in httpd.conf, or apache will refuse to obey
            the .htaccess instructions. If you didn't add any
            directory, that would be between these two lines (they
            are very near to each other):
            <Directory />
            </Directory>


            this is the line to add to httpd.conf:

            AllowOverride AuthConfig Limit


            TEXT TO COPY&PASTE
            #*******************************


            AuthType Basic
            AuthName "Usage page"
            AuthUserFile /tmp/.htpasswd_usage
            require user LOGIN

            # To generate a new password execute:
            # htapasswd -c /tmp/.htpasswd_usage LOGIN
            # the type the password you want to use.


            #*******************************

            =====
            Enric Naval
            Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
            GRIHO webalizer.conf
            http://griho.udl.es/webalizer/webalizer.conf.txt

            __________________________________________________
            Do You Yahoo!?
            Tired of spam? Yahoo! Mail has the best spam protection around
            http://mail.yahoo.com
          Your message has been successfully submitted and would be delivered to recipients shortly.