Loading ...
Sorry, an error occurred while loading the content.

Re: [webalizer] inktomisearch causes lots of fake visits in my stats

Expand Messages
  • gary hall
    Hi Eric, Currently my webmaster has installed *Webalizer Version 2.01* (with *Geolizer*
    Message 1 of 6 , May 14, 2005
    • 0 Attachment
      Hi Eric,

      Currently my webmaster has installed "Webalizer Version 2.01 (with Geolizer patch)".

      When looking at "total referrers" I show Google (528 hits), Yahoo (155) and AOL(100) spelled out, but the traffic for the 25 / 26 is "normal" looking. Don't see any spikes like you were showing. We show a total of 3392 visits, 22,500 hits - looks normal.

      I will pass this on to "Dave" and when I get his response (because he is the smart one of this effort) I will send you the results.

      Thanks.

      Warm regards,

      Gary

      Enric Naval wrote:
      --- gary hall <gary.chris@...> wrote:
      
        
      Dave,
      
      We don't seem to have this problem, but it is
      somrthing to think about.
      
      Gary
          
      You aren't crawled by inktomisearch? Are you sure?
      This domain is used by Slurp (the Yahoo! bot), so this
      would mean that your site won't appear in any search
      from Yahoo! because they don't crawl your site. For
      commercial sites, that's is a bad thing.
      
      If you are sure you have no visit from them (for
      example, because you  have an intranet), then please
      forget the rest of this email.
      
      
      You should grep your access_log files, looking for
      visits from inktomisearch, since you will probably
      have visits from them. I believe that it is not
      possible to see the problem in the stats unless you
      look directly the logfiles. This command lists all
      visits from inktomisearch. Could you run it and tell
      us if it worked? (remember that each line you will see
      is counted as one different visit)
      
      grep access_log
      ^[a-z][a-z][.][0-9][0-9][0-9][0-9][.]inktomisearch[.]com
      
      
      You see, I use this line in webalizer.conf:
      
      "GroupSite *inktomisearch.com Inktomi"
      
      In the Top Site list I had 148 visits from
      inktomisearch.com both before AND after using the
      script BUT the total number of visits had gone down
      from 41292 to 31512!
      
      So:
      
      Using_script Total_visits Visits_from_inktomi
      Yes            41292       148
      No             31512       148
      
      All other totals in the "Top Sites" list remain the
      same.
      
      Mind you, this behaviour is the expected and absolutly
      correct behaviour. 
      
      "Top Site list" and "total visits" are not related to
      each other and use different algorithms and get very
      different results....
      
      
      
      
        
      Enric Naval wrote:
      
          
      (this is a long email, sorry)
      
      
      I have a problem im my stats: 
      
      Inktomi uses a different IP for each hit. So, each
            
      hit
      >from inktomi counts as a separate visit, instead of
          
      many hits 
      
      counting as only one visit. 
      
      For example, these two entries are two different
      visits, despite it being the same user agent
            
      fetching
          
      the same file 
      
            
      >from the same domain within less than 30 minutes of
          
      difference:
      
      lj1124.inktomisearch.com - - [01/May/2005:00:14:07
      +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
      "Mozilla/5.0 
      
      (compatible; Yahoo! Slurp;
      http://help.yahoo.com/help/us/ysearch/slurp)"
      
      lj2545.inktomisearch.com - - [01/May/2005:00:42:18
      +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
      "Mozilla/5.0 
      
      (compatible; Yahoo! Slurp;
      http://help.yahoo.com/help/us/ysearch/slurp)"
      
      
      Every day 25 and every day 26 between hours 17 and
            
      18
          
      inktomisearch makes most of the visits to my
            
      server,
          
      and webalizer 
      
      counts from 500 to 1000 visits more than usual
            
      every
          
      one of those days. 
      
      This causes some weird kind of camel back in my
      graphics and leads me to believe that I had more
      visits than usual for 
      
      some reason, but it was only inktomisearch crawling
      the sites in the server. It also makes the other
            
      bars
          
      smaller, and 
      
      it's more difficult to look at trends in visits.
      
      Usually it not noticeable in individual sites,
            
      because
          
      it gets lost in the noise, but I can see it when I
            
      mix
          
      together 
      
      the logs for every site in the server.
      
      
      If you look at this image, you will see that on
            
      days
          
      25 and 26 I'm getting 30% more visits than days 27
            
      and
          
      28, but they 
      
      all have about the same number of hits and sites,
      which is higly suspicious. This happened in
            
      February,
          
      March and April, 
      
      so there had to be something wrong there. I have
      marked in red the suspicious-looking part (this is
      April).
      
            
      http://griho.udl.es/naval/webalizer/inktomisearch.gif
          
      
      
      webalizer.conf has no option to prevent this from
      happening. If I use, for example: 
      
      GroupReferrer .inktomisearch.com  Stupid inktomi 
      
      then webalizer will still count each hit as a
            
      visit.
          
      
      
      To solve this I have made a one-line sed command
            
      and
          
      now I use it on my logfiles before feeding them to
      webalizer (I 
      
      explain it below):
      
      sed
            
      s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
          
      access_log > access_log_sed
      
      
      This transforms all inktomi IPs this way. From:
      
      "lj2534.inktomisearch.com" 
      
      or
      
      "fj3612.inktomisearch.com" 
      
      to
      
      "inktomisearch.com"
      
      
      This way, all visits from inktomisearch.com get
      counted as only one visit, and you get a more
      realistical count of 
      
      visits.
      
      
      
      This is a comparison of daily visits graphs from my
      server stats for April. As you can see, inktomi was
      raising the maximum number of visits, and leveling
            
      all
          
      days at the same level. After using the script,
            
      it's
          
      easier to see that the server receives way less
            
      visits
          
      in weekends.
      
            
      http://griho.udl.es/naval/webalizer/inktomi_difference.gif
          
      
      
      
      Notes for the sed command:
      
      s/      means "substitute"
      
      ^       means start of a line
      
      [a-z]   means all letters from a to z
      
      [0-9]   all digits form 0 to 9
      
      [.]     the dot character has a especial meaning by
      itself, so I surround it with claudators
      
      
      Enric Naval
      Estudiante de Informática de Gestión en la Udl
            
      (Lleida)
        
    • Enric Naval
      ... My sites get crawled by inktomi in days 25 and 26. For your site, it will be different days. I guess that Inktomi crawls all days in the month, and days 25
      Message 2 of 6 , May 14, 2005
      • 0 Attachment
        --- gary hall <gary.chris@...> wrote:

        > Hi Eric,
        >
        > Currently my webmaster has installed "*Webalizer
        > Version 2.01*
        > <http://www.mrunix.net/webalizer/> (with *Geolizer*
        > <http://sysd.org/proj/log.php#glzr> patch)".
        >
        > When looking at "total referrers" I show Google (528
        > hits), Yahoo (155)
        > and AOL(100) spelled out, but the traffic for the 25
        > / 26 is "normal"
        > looking. Don't see any spikes like you were showing.
        > We show a total of
        > 3392 visits, 22,500 hits - looks normal.


        My sites get crawled by inktomi in days 25 and 26. For
        your site, it will be different days. I guess that
        Inktomi crawls all days in the month, and days 25 and
        26 are my turn to be re-crawled in depth, or the
        crawler happens to stump upon a big website on that
        day inside its monthly cycle.


        The referrers won't show you this problem, because
        inktomisearch (Yahoo! Slurp) shows an empty referrer
        "-". The referrers will normally show you what engines
        the visitors have used to reach you, but they won't
        show wheter the engine's bots have crawled your site
        because many times they use empty referrers.

        To find the engine's bots activity, you have to look
        in "Total Sites" or in "Total User Agents".

        For Inktomi, you should look for the string
        "inktomisearch.com" in "Total Sites" and the string
        "Slurp" in "Total User Agents".


        >
        > I will pass this on to "Dave" and when I get his
        > response (because he is
        > the smart one of this effort) I will send you the
        > results.
        >
        > Thanks.


        OK, thanks to you, too. Send this email to "Dave", if
        you can.


        >
        > Warm regards,
        >
        > Gary
        >
        > Enric Naval wrote:
        >
        > >--- gary hall <gary.chris@...> wrote:
        > >
        > >
        > >
        > >>Dave,
        > >>
        > >>We don't seem to have this problem, but it is
        > >>somrthing to think about.
        > >>
        > >>Gary
        > >>
        > >>
        > >
        > >You aren't crawled by inktomisearch? Are you sure?
        > >This domain is used by Slurp (the Yahoo! bot), so
        > this
        > >would mean that your site won't appear in any
        > search
        > >from Yahoo! because they don't crawl your site. For
        > >commercial sites, that's is a bad thing.
        > >
        > >If you are sure you have no visit from them (for
        > >example, because you have an intranet), then
        > please
        > >forget the rest of this email.
        > >
        > >
        > >You should grep your access_log files, looking for
        > >visits from inktomisearch, since you will probably
        > >have visits from them. I believe that it is not
        > >possible to see the problem in the stats unless you
        > >look directly the logfiles. This command lists all
        > >visits from inktomisearch. Could you run it and
        > tell
        > >us if it worked? (remember that each line you will
        > see
        > >is counted as one different visit)
        > >
        > >grep access_log
        >
        >^[a-z][a-z][.][0-9][0-9][0-9][0-9][.]inktomisearch[.]com
        > >
        > >
        > >You see, I use this line in webalizer.conf:
        > >
        > >"GroupSite *inktomisearch.com Inktomi"
        > >
        > >In the Top Site list I had 148 visits from
        > >inktomisearch.com both before AND after using the
        > >script BUT the total number of visits had gone down
        > >from 41292 to 31512!
        > >
        > >So:
        > >
        > >Using_script Total_visits Visits_from_inktomi
        > >Yes 41292 148
        > >No 31512 148
        > >
        > >All other totals in the "Top Sites" list remain the
        > >same.
        > >
        > >Mind you, this behaviour is the expected and
        > absolutly
        > >correct behaviour.
        > >
        > >"Top Site list" and "total visits" are not related
        > to
        > >each other and use different algorithms and get
        > very
        > >different results....
        > >
        > >
        > >
        > >
        > >
        > >
        > >>Enric Naval wrote:
        > >>
        > >>
        > >>
        > >>>(this is a long email, sorry)
        > >>>
        > >>>
        > >>>I have a problem im my stats:
        > >>>
        > >>>Inktomi uses a different IP for each hit. So,
        > each
        > >>>
        > >>>
        > >>hit
        > >>>from inktomi counts as a separate visit, instead
        > of
        > >>
        > >>
        > >>>many hits
        > >>>
        > >>>counting as only one visit.
        > >>>
        > >>>For example, these two entries are two different
        > >>>visits, despite it being the same user agent
        > >>>
        > >>>
        > >>fetching
        > >>
        > >>
        > >>>the same file
        > >>>
        > >>>
        > >>>
        > >>>from the same domain within less than 30 minutes
        > of
        > >>
        > >>
        > >>>difference:
        > >>>
        > >>>lj1124.inktomisearch.com - -
        > [01/May/2005:00:14:07
        > >>>+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
        > >>>"Mozilla/5.0
        > >>>
        > >>>(compatible; Yahoo! Slurp;
        > >>>http://help.yahoo.com/help/us/ysearch/slurp)"
        > >>>
        > >>>lj2545.inktomisearch.com - -
        > [01/May/2005:00:42:18
        > >>>+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
        > >>>"Mozilla/5.0
        > >>>
        > >>>(compatible; Yahoo! Slurp;
        > >>>http://help.yahoo.com/help/us/ysearch/slurp)"
        > >>>
        > >>>
        > >>>Every day 25 and every day 26 between hours 17
        > and
        > >>>
        > >>>
        > >>18
        > >>
        > >>
        > >>>inktomisearch makes most of the visits to my
        > >>>
        > >>>
        > >>server,
        > >>
        > >>
        > >>>and webalizer
        > >>>
        > >>>counts from 500 to 1000 visits more than usual
        > >>>
        > >>>
        > >>every
        > >>
        > >>
        > >>>one of those days.
        > >>>
        > >>>This causes some weird kind of camel back in my
        > >>>graphics and leads me to believe that I had more
        > >>>visits than usual for
        > >>>
        > >>>some reason, but it was only inktomisearch
        > crawling
        > >>>the sites in the server. It also makes the other
        > >>>
        > >>>
        > >>bars
        > >>
        > >>
        > >>>smaller, and
        > >>>
        > >>>it's more difficult to look at trends in visits.
        > >>>
        > >>>Usually it not noticeable in individual sites,
        > >>>
        > >>>
        > >>because
        > >>
        > >>
        > >>>it gets lost in the noise, but I can see it when
        > I
        > >>>
        > >>>
        > >>mix
        > >>
        > >>
        > >>>together
        > >>>
        > >>>the logs for every site in the server.
        > >>>
        > >>>
        > >>>If you look at this image, you will see that on
        > >>>
        > >>>
        > >>days
        > >>
        > >>
        > >>>25 and 26 I'm getting 30% more visits than days
        > 27
        > >>>
        > >>>
        > >>and
        > >>
        > >>
        > >>>28, but they
        > >>>
        > >>>all have about the same number of hits and sites,
        > >>>which is higly suspicious. This happened in
        > >>>
        > >>>
        > >>February,
        > >>
        > >>
        > >>>March and April,
        > >>>
        > >>>so there had to be something wrong there. I have
        > >>>marked in red the suspicious-looking part (this
        > is
        > >>>April).
        > >>>
        > >>>
        > >>>
        >
        >>http://griho.udl.es/naval/webalizer/inktomisearch.gif
        > >>
        > >>
        > >>>
        > >>>
        > >>>webalizer.conf has no option to prevent this from
        > >>>happening. If I use, for example:
        > >>>
        > >>>GroupReferrer .inktomisearch.com Stupid inktomi
        > >>>
        > >>>then webalizer will still count each hit as a
        > >>>
        > >>>
        > >>visit.
        > >>
        > >>
        > >>>
        > >>>
        > >>>To solve this I have made a one-line sed command
        > >>>
        > >>>
        > >>and
        > >>
        > >>
        > >>>now I use it on my logfiles before feeding them
        > to
        > >>>webalizer (I
        > >>>
        > >>>explain it below):
        > >>>
        > >>>sed
        > >>>
        > >>>
        >
        >>s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
        > >>
        > >>
        > >>>access_log > access_log_sed
        > >>>
        > >>>
        > >>>This transforms all inktomi IPs this way. From:
        > >>>
        > >>>"lj2534.inktomisearch.com"
        > >>>
        > >>>or
        > >>>
        > >>>"fj3612.inktomisearch.com"
        > >>>
        > >>>to
        > >>>
        > >>>"inktomisearch.com"
        > >>>
        > >>>
        > >>>This way, all visits from inktomisearch.com get
        > >>>counted as only one visit, and you get a more
        > >>>realistical count of
        > >>>
        > >>>visits.
        > >>>
        > >>>
        > >>>
        > >>>This is a comparison of daily visits graphs from
        > my
        > >>>server stats for April. As you can see, inktomi
        > was
        > >>>raising the maximum number of visits, and
        > leveling
        > >>>
        > >>>
        > >>all
        > >>
        > >>
        > >>>days at the same level. After using the script,
        > >>>
        > >>>
        > >>it's
        > >>
        > >>
        > >>>easier to see that the server receives way less
        > >>>
        > >>>
        > >>visits
        > >>
        > >>
        > >>>in weekends.
        > >>>
        > >>>
        > >>>
        >
        >>http://griho.udl.es/naval/webalizer/inktomi_difference.gif
        > >>
        > >>
        > >>>
        > >>>
        > >>>
        > >>>Notes for the sed command:
        > >>>
        > >>>s/ means "substitute"
        > >>>
        > >>>^ means start of a line
        > >>>
        > >>>[a-z] means all letters from a to z
        > >>>
        > >>>[0-9] all digits form 0 to 9
        > >>>
        > >>>[.] the dot character has a especial meaning
        > by
        > >>>itself, so I surround it with claudators
        > >>>
        > >>>
        > >>>Enric Naval
        > >>>Estudiante de Inform�tica de Gesti�n en la Udl
        > >>>
        > >>>
        > >>(Lleida)
        > >>
        > >
        > >
        >


        Enric Naval
        Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
        GRIHO webalizer.conf
        http://griho.udl.es/webalizer/webalizer.conf.txt



        Yahoo! Mail
        Stay connected, organized, and protected. Take the tour:
        http://tour.mail.yahoo.com/mailtour.html
      Your message has been successfully submitted and would be delivered to recipients shortly.