Loading ...
Sorry, an error occurred while loading the content.

inktomisearch causes lots of fake visits in my stats

Expand Messages
  • Enric Naval
    (this is a long email, sorry) I have a problem im my stats: Inktomi uses a different IP for each hit. So, each hit from inktomi counts as a separate visit,
    Message 1 of 6 , May 14, 2005
    • 0 Attachment
      (this is a long email, sorry)


      I have a problem im my stats:

      Inktomi uses a different IP for each hit. So, each hit
      from inktomi counts as a separate visit, instead of
      many hits

      counting as only one visit.

      For example, these two entries are two different
      visits, despite it being the same user agent fetching
      the same file

      from the same domain within less than 30 minutes of
      difference:

      lj1124.inktomisearch.com - - [01/May/2005:00:14:07
      +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
      "Mozilla/5.0

      (compatible; Yahoo! Slurp;
      http://help.yahoo.com/help/us/ysearch/slurp)"

      lj2545.inktomisearch.com - - [01/May/2005:00:42:18
      +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
      "Mozilla/5.0

      (compatible; Yahoo! Slurp;
      http://help.yahoo.com/help/us/ysearch/slurp)"


      Every day 25 and every day 26 between hours 17 and 18
      inktomisearch makes most of the visits to my server,
      and webalizer

      counts from 500 to 1000 visits more than usual every
      one of those days.

      This causes some weird kind of camel back in my
      graphics and leads me to believe that I had more
      visits than usual for

      some reason, but it was only inktomisearch crawling
      the sites in the server. It also makes the other bars
      smaller, and

      it's more difficult to look at trends in visits.

      Usually it not noticeable in individual sites, because
      it gets lost in the noise, but I can see it when I mix
      together

      the logs for every site in the server.


      If you look at this image, you will see that on days
      25 and 26 I'm getting 30% more visits than days 27 and
      28, but they

      all have about the same number of hits and sites,
      which is higly suspicious. This happened in February,
      March and April,

      so there had to be something wrong there. I have
      marked in red the suspicious-looking part (this is
      April).

      http://griho.udl.es/naval/webalizer/inktomisearch.gif




      webalizer.conf has no option to prevent this from
      happening. If I use, for example:

      GroupReferrer .inktomisearch.com Stupid inktomi

      then webalizer will still count each hit as a visit.




      To solve this I have made a one-line sed command and
      now I use it on my logfiles before feeding them to
      webalizer (I

      explain it below):

      sed
      s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
      access_log > access_log_sed


      This transforms all inktomi IPs this way. From:

      "lj2534.inktomisearch.com"

      or

      "fj3612.inktomisearch.com"

      to

      "inktomisearch.com"


      This way, all visits from inktomisearch.com get
      counted as only one visit, and you get a more
      realistical count of

      visits.



      This is a comparison of daily visits graphs from my
      server stats for April. As you can see, inktomi was
      raising the maximum number of visits, and leveling all
      days at the same level. After using the script, it's
      easier to see that the server receives way less visits
      in weekends.

      http://griho.udl.es/naval/webalizer/inktomi_difference.gif





      Notes for the sed command:

      s/ means "substitute"

      ^ means start of a line

      [a-z] means all letters from a to z

      [0-9] all digits form 0 to 9

      [.] the dot character has a especial meaning by
      itself, so I surround it with claudators


      Enric Naval
      Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
      GRIHO webalizer.conf
      http://griho.udl.es/webalizer/webalizer.conf.txt



      Yahoo! Mail
      Stay connected, organized, and protected. Take the tour:
      http://tour.mail.yahoo.com/mailtour.html
    • gary hall
      Dave, We don t seem to have this problem, but it is somrthing to think about. Gary
      Message 2 of 6 , May 14, 2005
      • 0 Attachment
        Dave,

        We don't seem to have this problem, but it is somrthing to think about.

        Gary

        Enric Naval wrote:

        >(this is a long email, sorry)
        >
        >
        >I have a problem im my stats:
        >
        >Inktomi uses a different IP for each hit. So, each hit
        >from inktomi counts as a separate visit, instead of
        >many hits
        >
        >counting as only one visit.
        >
        >For example, these two entries are two different
        >visits, despite it being the same user agent fetching
        >the same file
        >
        >from the same domain within less than 30 minutes of
        >difference:
        >
        >lj1124.inktomisearch.com - - [01/May/2005:00:14:07
        >+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
        >"Mozilla/5.0
        >
        >(compatible; Yahoo! Slurp;
        >http://help.yahoo.com/help/us/ysearch/slurp)"
        >
        >lj2545.inktomisearch.com - - [01/May/2005:00:42:18
        >+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
        >"Mozilla/5.0
        >
        >(compatible; Yahoo! Slurp;
        >http://help.yahoo.com/help/us/ysearch/slurp)"
        >
        >
        >Every day 25 and every day 26 between hours 17 and 18
        >inktomisearch makes most of the visits to my server,
        >and webalizer
        >
        >counts from 500 to 1000 visits more than usual every
        >one of those days.
        >
        >This causes some weird kind of camel back in my
        >graphics and leads me to believe that I had more
        >visits than usual for
        >
        >some reason, but it was only inktomisearch crawling
        >the sites in the server. It also makes the other bars
        >smaller, and
        >
        >it's more difficult to look at trends in visits.
        >
        >Usually it not noticeable in individual sites, because
        >it gets lost in the noise, but I can see it when I mix
        >together
        >
        >the logs for every site in the server.
        >
        >
        >If you look at this image, you will see that on days
        >25 and 26 I'm getting 30% more visits than days 27 and
        >28, but they
        >
        >all have about the same number of hits and sites,
        >which is higly suspicious. This happened in February,
        >March and April,
        >
        >so there had to be something wrong there. I have
        >marked in red the suspicious-looking part (this is
        >April).
        >
        >http://griho.udl.es/naval/webalizer/inktomisearch.gif
        >
        >
        >
        >
        >webalizer.conf has no option to prevent this from
        >happening. If I use, for example:
        >
        >GroupReferrer .inktomisearch.com Stupid inktomi
        >
        >then webalizer will still count each hit as a visit.
        >
        >
        >
        >
        >To solve this I have made a one-line sed command and
        >now I use it on my logfiles before feeding them to
        >webalizer (I
        >
        >explain it below):
        >
        >sed
        >s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
        >access_log > access_log_sed
        >
        >
        >This transforms all inktomi IPs this way. From:
        >
        >"lj2534.inktomisearch.com"
        >
        >or
        >
        >"fj3612.inktomisearch.com"
        >
        >to
        >
        >"inktomisearch.com"
        >
        >
        >This way, all visits from inktomisearch.com get
        >counted as only one visit, and you get a more
        >realistical count of
        >
        >visits.
        >
        >
        >
        >This is a comparison of daily visits graphs from my
        >server stats for April. As you can see, inktomi was
        >raising the maximum number of visits, and leveling all
        >days at the same level. After using the script, it's
        >easier to see that the server receives way less visits
        >in weekends.
        >
        >http://griho.udl.es/naval/webalizer/inktomi_difference.gif
        >
        >
        >
        >
        >
        >Notes for the sed command:
        >
        >s/ means "substitute"
        >
        >^ means start of a line
        >
        >[a-z] means all letters from a to z
        >
        >[0-9] all digits form 0 to 9
        >
        >[.] the dot character has a especial meaning by
        >itself, so I surround it with claudators
        >
        >
        >Enric Naval
        >Estudiante de Informática de Gestión en la Udl (Lleida)
        >GRIHO webalizer.conf
        >http://griho.udl.es/webalizer/webalizer.conf.txt
        >
        >
        >
        >Yahoo! Mail
        >Stay connected, organized, and protected. Take the tour:
        >http://tour.mail.yahoo.com/mailtour.html
        >
        >
        >
        >Webalizer homepage: http://www.webalizer.org
        >
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
        >
        >
      • gary hall
        Opps! Sorry - Hit the reply by mistake.. Warm regards, Gary
        Message 3 of 6 , May 14, 2005
        • 0 Attachment
          Opps!

          Sorry - Hit the "reply" by mistake..

          Warm regards,

          Gary

          Enric Naval wrote:

          >(this is a long email, sorry)
          >
          >
          >I have a problem im my stats:
          >
          >Inktomi uses a different IP for each hit. So, each hit
          >from inktomi counts as a separate visit, instead of
          >many hits
          >
          >
        • Enric Naval
          ... You aren t crawled by inktomisearch? Are you sure? This domain is used by Slurp (the Yahoo! bot), so this would mean that your site won t appear in any
          Message 4 of 6 , May 14, 2005
          • 0 Attachment
            --- gary hall <gary.chris@...> wrote:

            > Dave,
            >
            > We don't seem to have this problem, but it is
            > somrthing to think about.
            >
            > Gary

            You aren't crawled by inktomisearch? Are you sure?
            This domain is used by Slurp (the Yahoo! bot), so this
            would mean that your site won't appear in any search
            from Yahoo! because they don't crawl your site. For
            commercial sites, that's is a bad thing.

            If you are sure you have no visit from them (for
            example, because you have an intranet), then please
            forget the rest of this email.


            You should grep your access_log files, looking for
            visits from inktomisearch, since you will probably
            have visits from them. I believe that it is not
            possible to see the problem in the stats unless you
            look directly the logfiles. This command lists all
            visits from inktomisearch. Could you run it and tell
            us if it worked? (remember that each line you will see
            is counted as one different visit)

            grep access_log
            ^[a-z][a-z][.][0-9][0-9][0-9][0-9][.]inktomisearch[.]com


            You see, I use this line in webalizer.conf:

            "GroupSite *inktomisearch.com Inktomi"

            In the Top Site list I had 148 visits from
            inktomisearch.com both before AND after using the
            script BUT the total number of visits had gone down
            from 41292 to 31512!

            So:

            Using_script Total_visits Visits_from_inktomi
            Yes 41292 148
            No 31512 148

            All other totals in the "Top Sites" list remain the
            same.

            Mind you, this behaviour is the expected and absolutly
            correct behaviour.

            "Top Site list" and "total visits" are not related to
            each other and use different algorithms and get very
            different results....




            >
            > Enric Naval wrote:
            >
            > >(this is a long email, sorry)
            > >
            > >
            > >I have a problem im my stats:
            > >
            > >Inktomi uses a different IP for each hit. So, each
            > hit
            > >from inktomi counts as a separate visit, instead of
            > >many hits
            > >
            > >counting as only one visit.
            > >
            > >For example, these two entries are two different
            > >visits, despite it being the same user agent
            > fetching
            > >the same file
            > >
            > >from the same domain within less than 30 minutes of
            > >difference:
            > >
            > >lj1124.inktomisearch.com - - [01/May/2005:00:14:07
            > >+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
            > >"Mozilla/5.0
            > >
            > >(compatible; Yahoo! Slurp;
            > >http://help.yahoo.com/help/us/ysearch/slurp)"
            > >
            > >lj2545.inktomisearch.com - - [01/May/2005:00:42:18
            > >+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
            > >"Mozilla/5.0
            > >
            > >(compatible; Yahoo! Slurp;
            > >http://help.yahoo.com/help/us/ysearch/slurp)"
            > >
            > >
            > >Every day 25 and every day 26 between hours 17 and
            > 18
            > >inktomisearch makes most of the visits to my
            > server,
            > >and webalizer
            > >
            > >counts from 500 to 1000 visits more than usual
            > every
            > >one of those days.
            > >
            > >This causes some weird kind of camel back in my
            > >graphics and leads me to believe that I had more
            > >visits than usual for
            > >
            > >some reason, but it was only inktomisearch crawling
            > >the sites in the server. It also makes the other
            > bars
            > >smaller, and
            > >
            > >it's more difficult to look at trends in visits.
            > >
            > >Usually it not noticeable in individual sites,
            > because
            > >it gets lost in the noise, but I can see it when I
            > mix
            > >together
            > >
            > >the logs for every site in the server.
            > >
            > >
            > >If you look at this image, you will see that on
            > days
            > >25 and 26 I'm getting 30% more visits than days 27
            > and
            > >28, but they
            > >
            > >all have about the same number of hits and sites,
            > >which is higly suspicious. This happened in
            > February,
            > >March and April,
            > >
            > >so there had to be something wrong there. I have
            > >marked in red the suspicious-looking part (this is
            > >April).
            > >
            >
            >http://griho.udl.es/naval/webalizer/inktomisearch.gif
            > >
            > >
            > >
            > >
            > >webalizer.conf has no option to prevent this from
            > >happening. If I use, for example:
            > >
            > >GroupReferrer .inktomisearch.com Stupid inktomi
            > >
            > >then webalizer will still count each hit as a
            > visit.
            > >
            > >
            > >
            > >
            > >To solve this I have made a one-line sed command
            > and
            > >now I use it on my logfiles before feeding them to
            > >webalizer (I
            > >
            > >explain it below):
            > >
            > >sed
            >
            >s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
            > >access_log > access_log_sed
            > >
            > >
            > >This transforms all inktomi IPs this way. From:
            > >
            > >"lj2534.inktomisearch.com"
            > >
            > >or
            > >
            > >"fj3612.inktomisearch.com"
            > >
            > >to
            > >
            > >"inktomisearch.com"
            > >
            > >
            > >This way, all visits from inktomisearch.com get
            > >counted as only one visit, and you get a more
            > >realistical count of
            > >
            > >visits.
            > >
            > >
            > >
            > >This is a comparison of daily visits graphs from my
            > >server stats for April. As you can see, inktomi was
            > >raising the maximum number of visits, and leveling
            > all
            > >days at the same level. After using the script,
            > it's
            > >easier to see that the server receives way less
            > visits
            > >in weekends.
            > >
            >
            >http://griho.udl.es/naval/webalizer/inktomi_difference.gif
            > >
            > >
            > >
            > >
            > >
            > >Notes for the sed command:
            > >
            > >s/ means "substitute"
            > >
            > >^ means start of a line
            > >
            > >[a-z] means all letters from a to z
            > >
            > >[0-9] all digits form 0 to 9
            > >
            > >[.] the dot character has a especial meaning by
            > >itself, so I surround it with claudators
            > >
            > >
            > >Enric Naval
            > >Estudiante de Inform�tica de Gesti�n en la Udl
            > (Lleida)
            > >GRIHO webalizer.conf
            > >http://griho.udl.es/webalizer/webalizer.conf.txt
            > >
            > >
            > >
            > >Yahoo! Mail
            > >Stay connected, organized, and protected. Take the
            > tour:
            > >http://tour.mail.yahoo.com/mailtour.html
            > >
            > >
            > >
            > >Webalizer homepage: http://www.webalizer.org
            > >
            > >Yahoo! Groups Links
            > >
            > >
            > >
            > >
            > >
            > >
            > >
            > >
            > >
            >


            Enric Naval
            Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
            GRIHO webalizer.conf
            http://griho.udl.es/webalizer/webalizer.conf.txt



            Yahoo! Mail
            Stay connected, organized, and protected. Take the tour:
            http://tour.mail.yahoo.com/mailtour.html
          • gary hall
            Hi Eric, Currently my webmaster has installed *Webalizer Version 2.01* (with *Geolizer*
            Message 5 of 6 , May 14, 2005
            • 0 Attachment
              Hi Eric,

              Currently my webmaster has installed "Webalizer Version 2.01 (with Geolizer patch)".

              When looking at "total referrers" I show Google (528 hits), Yahoo (155) and AOL(100) spelled out, but the traffic for the 25 / 26 is "normal" looking. Don't see any spikes like you were showing. We show a total of 3392 visits, 22,500 hits - looks normal.

              I will pass this on to "Dave" and when I get his response (because he is the smart one of this effort) I will send you the results.

              Thanks.

              Warm regards,

              Gary

              Enric Naval wrote:
              --- gary hall <gary.chris@...> wrote:
              
                
              Dave,
              
              We don't seem to have this problem, but it is
              somrthing to think about.
              
              Gary
                  
              You aren't crawled by inktomisearch? Are you sure?
              This domain is used by Slurp (the Yahoo! bot), so this
              would mean that your site won't appear in any search
              from Yahoo! because they don't crawl your site. For
              commercial sites, that's is a bad thing.
              
              If you are sure you have no visit from them (for
              example, because you  have an intranet), then please
              forget the rest of this email.
              
              
              You should grep your access_log files, looking for
              visits from inktomisearch, since you will probably
              have visits from them. I believe that it is not
              possible to see the problem in the stats unless you
              look directly the logfiles. This command lists all
              visits from inktomisearch. Could you run it and tell
              us if it worked? (remember that each line you will see
              is counted as one different visit)
              
              grep access_log
              ^[a-z][a-z][.][0-9][0-9][0-9][0-9][.]inktomisearch[.]com
              
              
              You see, I use this line in webalizer.conf:
              
              "GroupSite *inktomisearch.com Inktomi"
              
              In the Top Site list I had 148 visits from
              inktomisearch.com both before AND after using the
              script BUT the total number of visits had gone down
              from 41292 to 31512!
              
              So:
              
              Using_script Total_visits Visits_from_inktomi
              Yes            41292       148
              No             31512       148
              
              All other totals in the "Top Sites" list remain the
              same.
              
              Mind you, this behaviour is the expected and absolutly
              correct behaviour. 
              
              "Top Site list" and "total visits" are not related to
              each other and use different algorithms and get very
              different results....
              
              
              
              
                
              Enric Naval wrote:
              
                  
              (this is a long email, sorry)
              
              
              I have a problem im my stats: 
              
              Inktomi uses a different IP for each hit. So, each
                    
              hit
              >from inktomi counts as a separate visit, instead of
                  
              many hits 
              
              counting as only one visit. 
              
              For example, these two entries are two different
              visits, despite it being the same user agent
                    
              fetching
                  
              the same file 
              
                    
              >from the same domain within less than 30 minutes of
                  
              difference:
              
              lj1124.inktomisearch.com - - [01/May/2005:00:14:07
              +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
              "Mozilla/5.0 
              
              (compatible; Yahoo! Slurp;
              http://help.yahoo.com/help/us/ysearch/slurp)"
              
              lj2545.inktomisearch.com - - [01/May/2005:00:42:18
              +0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
              "Mozilla/5.0 
              
              (compatible; Yahoo! Slurp;
              http://help.yahoo.com/help/us/ysearch/slurp)"
              
              
              Every day 25 and every day 26 between hours 17 and
                    
              18
                  
              inktomisearch makes most of the visits to my
                    
              server,
                  
              and webalizer 
              
              counts from 500 to 1000 visits more than usual
                    
              every
                  
              one of those days. 
              
              This causes some weird kind of camel back in my
              graphics and leads me to believe that I had more
              visits than usual for 
              
              some reason, but it was only inktomisearch crawling
              the sites in the server. It also makes the other
                    
              bars
                  
              smaller, and 
              
              it's more difficult to look at trends in visits.
              
              Usually it not noticeable in individual sites,
                    
              because
                  
              it gets lost in the noise, but I can see it when I
                    
              mix
                  
              together 
              
              the logs for every site in the server.
              
              
              If you look at this image, you will see that on
                    
              days
                  
              25 and 26 I'm getting 30% more visits than days 27
                    
              and
                  
              28, but they 
              
              all have about the same number of hits and sites,
              which is higly suspicious. This happened in
                    
              February,
                  
              March and April, 
              
              so there had to be something wrong there. I have
              marked in red the suspicious-looking part (this is
              April).
              
                    
              http://griho.udl.es/naval/webalizer/inktomisearch.gif
                  
              
              
              webalizer.conf has no option to prevent this from
              happening. If I use, for example: 
              
              GroupReferrer .inktomisearch.com  Stupid inktomi 
              
              then webalizer will still count each hit as a
                    
              visit.
                  
              
              
              To solve this I have made a one-line sed command
                    
              and
                  
              now I use it on my logfiles before feeding them to
              webalizer (I 
              
              explain it below):
              
              sed
                    
              s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
                  
              access_log > access_log_sed
              
              
              This transforms all inktomi IPs this way. From:
              
              "lj2534.inktomisearch.com" 
              
              or
              
              "fj3612.inktomisearch.com" 
              
              to
              
              "inktomisearch.com"
              
              
              This way, all visits from inktomisearch.com get
              counted as only one visit, and you get a more
              realistical count of 
              
              visits.
              
              
              
              This is a comparison of daily visits graphs from my
              server stats for April. As you can see, inktomi was
              raising the maximum number of visits, and leveling
                    
              all
                  
              days at the same level. After using the script,
                    
              it's
                  
              easier to see that the server receives way less
                    
              visits
                  
              in weekends.
              
                    
              http://griho.udl.es/naval/webalizer/inktomi_difference.gif
                  
              
              
              
              Notes for the sed command:
              
              s/      means "substitute"
              
              ^       means start of a line
              
              [a-z]   means all letters from a to z
              
              [0-9]   all digits form 0 to 9
              
              [.]     the dot character has a especial meaning by
              itself, so I surround it with claudators
              
              
              Enric Naval
              Estudiante de Informática de Gestión en la Udl
                    
              (Lleida)
                
            • Enric Naval
              ... My sites get crawled by inktomi in days 25 and 26. For your site, it will be different days. I guess that Inktomi crawls all days in the month, and days 25
              Message 6 of 6 , May 14, 2005
              • 0 Attachment
                --- gary hall <gary.chris@...> wrote:

                > Hi Eric,
                >
                > Currently my webmaster has installed "*Webalizer
                > Version 2.01*
                > <http://www.mrunix.net/webalizer/> (with *Geolizer*
                > <http://sysd.org/proj/log.php#glzr> patch)".
                >
                > When looking at "total referrers" I show Google (528
                > hits), Yahoo (155)
                > and AOL(100) spelled out, but the traffic for the 25
                > / 26 is "normal"
                > looking. Don't see any spikes like you were showing.
                > We show a total of
                > 3392 visits, 22,500 hits - looks normal.


                My sites get crawled by inktomi in days 25 and 26. For
                your site, it will be different days. I guess that
                Inktomi crawls all days in the month, and days 25 and
                26 are my turn to be re-crawled in depth, or the
                crawler happens to stump upon a big website on that
                day inside its monthly cycle.


                The referrers won't show you this problem, because
                inktomisearch (Yahoo! Slurp) shows an empty referrer
                "-". The referrers will normally show you what engines
                the visitors have used to reach you, but they won't
                show wheter the engine's bots have crawled your site
                because many times they use empty referrers.

                To find the engine's bots activity, you have to look
                in "Total Sites" or in "Total User Agents".

                For Inktomi, you should look for the string
                "inktomisearch.com" in "Total Sites" and the string
                "Slurp" in "Total User Agents".


                >
                > I will pass this on to "Dave" and when I get his
                > response (because he is
                > the smart one of this effort) I will send you the
                > results.
                >
                > Thanks.


                OK, thanks to you, too. Send this email to "Dave", if
                you can.


                >
                > Warm regards,
                >
                > Gary
                >
                > Enric Naval wrote:
                >
                > >--- gary hall <gary.chris@...> wrote:
                > >
                > >
                > >
                > >>Dave,
                > >>
                > >>We don't seem to have this problem, but it is
                > >>somrthing to think about.
                > >>
                > >>Gary
                > >>
                > >>
                > >
                > >You aren't crawled by inktomisearch? Are you sure?
                > >This domain is used by Slurp (the Yahoo! bot), so
                > this
                > >would mean that your site won't appear in any
                > search
                > >from Yahoo! because they don't crawl your site. For
                > >commercial sites, that's is a bad thing.
                > >
                > >If you are sure you have no visit from them (for
                > >example, because you have an intranet), then
                > please
                > >forget the rest of this email.
                > >
                > >
                > >You should grep your access_log files, looking for
                > >visits from inktomisearch, since you will probably
                > >have visits from them. I believe that it is not
                > >possible to see the problem in the stats unless you
                > >look directly the logfiles. This command lists all
                > >visits from inktomisearch. Could you run it and
                > tell
                > >us if it worked? (remember that each line you will
                > see
                > >is counted as one different visit)
                > >
                > >grep access_log
                >
                >^[a-z][a-z][.][0-9][0-9][0-9][0-9][.]inktomisearch[.]com
                > >
                > >
                > >You see, I use this line in webalizer.conf:
                > >
                > >"GroupSite *inktomisearch.com Inktomi"
                > >
                > >In the Top Site list I had 148 visits from
                > >inktomisearch.com both before AND after using the
                > >script BUT the total number of visits had gone down
                > >from 41292 to 31512!
                > >
                > >So:
                > >
                > >Using_script Total_visits Visits_from_inktomi
                > >Yes 41292 148
                > >No 31512 148
                > >
                > >All other totals in the "Top Sites" list remain the
                > >same.
                > >
                > >Mind you, this behaviour is the expected and
                > absolutly
                > >correct behaviour.
                > >
                > >"Top Site list" and "total visits" are not related
                > to
                > >each other and use different algorithms and get
                > very
                > >different results....
                > >
                > >
                > >
                > >
                > >
                > >
                > >>Enric Naval wrote:
                > >>
                > >>
                > >>
                > >>>(this is a long email, sorry)
                > >>>
                > >>>
                > >>>I have a problem im my stats:
                > >>>
                > >>>Inktomi uses a different IP for each hit. So,
                > each
                > >>>
                > >>>
                > >>hit
                > >>>from inktomi counts as a separate visit, instead
                > of
                > >>
                > >>
                > >>>many hits
                > >>>
                > >>>counting as only one visit.
                > >>>
                > >>>For example, these two entries are two different
                > >>>visits, despite it being the same user agent
                > >>>
                > >>>
                > >>fetching
                > >>
                > >>
                > >>>the same file
                > >>>
                > >>>
                > >>>
                > >>>from the same domain within less than 30 minutes
                > of
                > >>
                > >>
                > >>>difference:
                > >>>
                > >>>lj1124.inktomisearch.com - -
                > [01/May/2005:00:14:07
                > >>>+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
                > >>>"Mozilla/5.0
                > >>>
                > >>>(compatible; Yahoo! Slurp;
                > >>>http://help.yahoo.com/help/us/ysearch/slurp)"
                > >>>
                > >>>lj2545.inktomisearch.com - -
                > [01/May/2005:00:42:18
                > >>>+0200] "GET /robots.txt HTTP/1.0" 200 873 "-"
                > >>>"Mozilla/5.0
                > >>>
                > >>>(compatible; Yahoo! Slurp;
                > >>>http://help.yahoo.com/help/us/ysearch/slurp)"
                > >>>
                > >>>
                > >>>Every day 25 and every day 26 between hours 17
                > and
                > >>>
                > >>>
                > >>18
                > >>
                > >>
                > >>>inktomisearch makes most of the visits to my
                > >>>
                > >>>
                > >>server,
                > >>
                > >>
                > >>>and webalizer
                > >>>
                > >>>counts from 500 to 1000 visits more than usual
                > >>>
                > >>>
                > >>every
                > >>
                > >>
                > >>>one of those days.
                > >>>
                > >>>This causes some weird kind of camel back in my
                > >>>graphics and leads me to believe that I had more
                > >>>visits than usual for
                > >>>
                > >>>some reason, but it was only inktomisearch
                > crawling
                > >>>the sites in the server. It also makes the other
                > >>>
                > >>>
                > >>bars
                > >>
                > >>
                > >>>smaller, and
                > >>>
                > >>>it's more difficult to look at trends in visits.
                > >>>
                > >>>Usually it not noticeable in individual sites,
                > >>>
                > >>>
                > >>because
                > >>
                > >>
                > >>>it gets lost in the noise, but I can see it when
                > I
                > >>>
                > >>>
                > >>mix
                > >>
                > >>
                > >>>together
                > >>>
                > >>>the logs for every site in the server.
                > >>>
                > >>>
                > >>>If you look at this image, you will see that on
                > >>>
                > >>>
                > >>days
                > >>
                > >>
                > >>>25 and 26 I'm getting 30% more visits than days
                > 27
                > >>>
                > >>>
                > >>and
                > >>
                > >>
                > >>>28, but they
                > >>>
                > >>>all have about the same number of hits and sites,
                > >>>which is higly suspicious. This happened in
                > >>>
                > >>>
                > >>February,
                > >>
                > >>
                > >>>March and April,
                > >>>
                > >>>so there had to be something wrong there. I have
                > >>>marked in red the suspicious-looking part (this
                > is
                > >>>April).
                > >>>
                > >>>
                > >>>
                >
                >>http://griho.udl.es/naval/webalizer/inktomisearch.gif
                > >>
                > >>
                > >>>
                > >>>
                > >>>webalizer.conf has no option to prevent this from
                > >>>happening. If I use, for example:
                > >>>
                > >>>GroupReferrer .inktomisearch.com Stupid inktomi
                > >>>
                > >>>then webalizer will still count each hit as a
                > >>>
                > >>>
                > >>visit.
                > >>
                > >>
                > >>>
                > >>>
                > >>>To solve this I have made a one-line sed command
                > >>>
                > >>>
                > >>and
                > >>
                > >>
                > >>>now I use it on my logfiles before feeding them
                > to
                > >>>webalizer (I
                > >>>
                > >>>explain it below):
                > >>>
                > >>>sed
                > >>>
                > >>>
                >
                >>s/^[a-z][a-z][0-9][0-9][0-9][0-9][.]inktomisearch[.]com/inktomisearch.com/
                > >>
                > >>
                > >>>access_log > access_log_sed
                > >>>
                > >>>
                > >>>This transforms all inktomi IPs this way. From:
                > >>>
                > >>>"lj2534.inktomisearch.com"
                > >>>
                > >>>or
                > >>>
                > >>>"fj3612.inktomisearch.com"
                > >>>
                > >>>to
                > >>>
                > >>>"inktomisearch.com"
                > >>>
                > >>>
                > >>>This way, all visits from inktomisearch.com get
                > >>>counted as only one visit, and you get a more
                > >>>realistical count of
                > >>>
                > >>>visits.
                > >>>
                > >>>
                > >>>
                > >>>This is a comparison of daily visits graphs from
                > my
                > >>>server stats for April. As you can see, inktomi
                > was
                > >>>raising the maximum number of visits, and
                > leveling
                > >>>
                > >>>
                > >>all
                > >>
                > >>
                > >>>days at the same level. After using the script,
                > >>>
                > >>>
                > >>it's
                > >>
                > >>
                > >>>easier to see that the server receives way less
                > >>>
                > >>>
                > >>visits
                > >>
                > >>
                > >>>in weekends.
                > >>>
                > >>>
                > >>>
                >
                >>http://griho.udl.es/naval/webalizer/inktomi_difference.gif
                > >>
                > >>
                > >>>
                > >>>
                > >>>
                > >>>Notes for the sed command:
                > >>>
                > >>>s/ means "substitute"
                > >>>
                > >>>^ means start of a line
                > >>>
                > >>>[a-z] means all letters from a to z
                > >>>
                > >>>[0-9] all digits form 0 to 9
                > >>>
                > >>>[.] the dot character has a especial meaning
                > by
                > >>>itself, so I surround it with claudators
                > >>>
                > >>>
                > >>>Enric Naval
                > >>>Estudiante de Inform�tica de Gesti�n en la Udl
                > >>>
                > >>>
                > >>(Lleida)
                > >>
                > >
                > >
                >


                Enric Naval
                Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
                GRIHO webalizer.conf
                http://griho.udl.es/webalizer/webalizer.conf.txt



                Yahoo! Mail
                Stay connected, organized, and protected. Take the tour:
                http://tour.mail.yahoo.com/mailtour.html
              Your message has been successfully submitted and would be delivered to recipients shortly.