Loading ...
Sorry, an error occurred while loading the content.

webalizer-2.01-10 patches for your consideration

Expand Messages
  • Landon Noll
    We have been successfully using modified webalizer-2.01-10 extensively on multiple sites, from the very large to the very small for some time now. A message
    Message 1 of 1 , Jun 29, 2006
    • 0 Attachment
      We have been successfully using modified webalizer-2.01-10 extensively on
      multiple sites, from the very large to the very small for some time now.
      A message was posted around Jul 29, 2004 that described these patches.
      This posting is a revised version of that posting that contains an
      additional patch.

      We have made a number of mods to the standard webalizer-2.01-10
      distribution as well as built a number of tools to process multiple
      virtual sites as well as to create summary/roll-up stats for all of the
      virtual sites on a given server.

      The topic of this posting are the patches that we have applied
      to the webalizer-2.01-10 distribution. The URL:

      http://www.isthe.com/chongo/src/webalizer-patch/

      contains 5 patches, 4 of which I recommend to all webalizer
      users and the 5th is a re-package of the geolizer patch.

      For those who remember the Jul 29, 2004 posting, the patches
      0.basic.patch, 1.64bit.patch, and 2.hist.patch are identical except
      for a few minor comment changes. The patch 3.abs_url.patch is new.
      The patch 3.geolizer.patch as been renamed 4.geolizer.patch.

      =-=

      The 0.basic.patch:

      http://www.isthe.com/chongo/src/webalizer-patch/0.basic.patch

      Does the following:

      * ability to process very large log files (> 2GB in size)

      * countries patch

      Some of the entries on the list are not countries. In some
      cases the nation state status is contested. In other cases
      the entry is related to a territory that does not claim to
      be a country. In some cases what some claim is a country is
      in dispute by another country. And things like .arpa are
      not a country.

      I recommend that one use the term 'location' instead of
      'Nation' or 'Country' to avoid the whole mess. ;-)

      Added are some missing locations (from the ISO UN codes and
      from GeoIP's list). Some location names have been corrected
      or changed to their official name. Added some more TLDs.

      * avoid referrer spamming (IMPORTANT)

      Spammers and other low-life forms have been stuffing the
      "top N referrer" table in order to get webalizer to generate
      links to their sites ... (perhaps because they think this
      will improve their search engine placement or perhaps because
      they wish to direct people to a poisoned web page in an effort
      to exploit some browser bug?). Whatever the reason, we don't
      need to give them their links.

      This patch turns the "top N referrer" table into just values
      instead of A tag.

      * correctly process log entries made during a leap second

      * long referrer and search patch

      Quite a few referrer and search strings are between 128
      and 256 chars in length. Avoid truncating them.

      How to apply this patch:

      # place a copy the original webalizer-2.01-10 source under:
      #
      # ./webalizer-2.01-10

      patch -p0 < 0.basic.patch

      =-=

      The 1.64bit.patch:

      http://www.isthe.com/chongo/src/webalizer-patch/1.64bit.patch

      Does the following:

      * avoid 32 bit counter overflow

      For very busy sites, 32 bit signed counters can overflow.
      This is particularly when using webalizer to cover a long
      span of time. This patch converts a few values to be
      u_int64_t to avoid these numeric overflow problems.

      How to apply this patch:

      # be sure you have applied the patch:
      #
      # 0.basic.patch

      patch -p0 < 1.64bit.patch

      =-=

      The 2.hist.patch:

      http://www.isthe.com/chongo/src/webalizer-patch/2.hist.patch

      Does the following:

      * extend the summary page for longer than 12 months

      By default, webalizer only keeps the last 12 months of data.
      And at the start of a month, the oldest month is discarded
      resulting in only 11+ months of data.

      This code gets around the 12 month limit by maintaining a
      history of older months in a parallel directory ../history.

      See the webalizer page:

      http://www.isthe.com/site/isthe/webalizer/usage/

      for an example of this effect.

      How to apply this patch:

      # be sure you have applied the patches:
      #
      # 0.basic.patch
      # 1.64bit.patch

      patch -p0 < 2.hist.patch

      NOTE: After the 2.hist.patch has been applied, the
      track-hist tool:

      http://www.isthe.com/chongo/src/webalizer-patch/track_hist

      should be run on a monthly basis. See the comments
      in the 2.hist.patch file as well as the track-hist tool
      itself for details.

      =-=

      The 3.abs_url.patch:

      http://www.isthe.com/chongo/src/webalizer-patch/3.abs_url.patch

      Does the following:

      * Adds a -z option to cause webalizer to convert log entries with
      absolute URLs into relative URLs.

      For example, if a log shows the access of the URL

      http://www.example.com/some/path.html

      the -z flag will cause webalizer to read it as:

      /some/path.html

      The reason why you want to use -z is so that webalizer will
      not produce web pages with links that refer to external URLs.
      Link spammers might attempt to inject bogus URLs into the
      logs in order to try can get webalizer to create links to
      their site (say because they think their site will go up
      in search page rank if they create links to their site).
      Many web servers, such as apache, will by default convert
      an HTTP request of:

      GET http://example.net/index.html HTTP/1.1
      Host: www.example.com

      Even though the apache web server is setup for
      serving web pages for www.example.com only, the URL for
      http://example.net/index.html is treated as a local URL and
      is resolved as:

      /index.html

      And thus apache will give a 200 code if /index.html is
      accessible (which it usually is). However apache logs the
      URL access as the absolute http://example.net/index.html
      URL even though the local URL /index.html is returned!

      A URL spammer can fetch with these # absolute URLs enough to
      push their absolute URL site become one of the top N URLs in a
      webalizer report.

      In addition, system cracker tools that probe for open
      web proxies and exploit URLs will issue HTTP requests with
      absolute URLs. So they are another way that strange absolute
      URLs can wind up in access logs.

      By using the -z flag, webalizer will strip out the
      method://host.name and effectively convey the absolute URL
      into the relative local URL that the web server actually
      processed. Thus with -z, webalizer will only produce URLs
      that are relative to the local web server.

      * Adds -Z option so that webalizer will ignore any URL in
      an access log that is NOT an absolute URL.

      The reason why one might want to use this option is when
      one is combining logs from multiple web sites where each
      sites log is converted into an absolute URL for that site.
      By ignoring all non-absolute URLs, only those converted log
      entries will be processed.

      Normally one would not use the -Z (upper case) flag.
      If one does use -Z, one should NOT use -z.

      How to apply this patch:

      # be sure you have applied the patches:
      #
      # 0.basic.patch
      # 1.64bit.patch
      # 2.hist.patch

      patch -p0 < 3.abs_url.patch

      NOTE: After applying this patch, I recommend always running webalizer
      with the -z (lower case) option or avoid URL spamming.

      =-=

      The optional 4.geolizer.patch patch:

      http://www.isthe.com/chongo/src/webalizer-patch/4.geolizer.patch

      NOTE:

      Apply If AND ONLY IF you use one the MaxMind
      (http://www.maxmind.com/) GeoIP databases. It is a just a
      reapplication of the geolizer.patch that works for Un*x / Linux /
      GNU-Linux systems after the first 4 patches have been applied.

      How to apply this patch:


      # be sure you have applied the patches:
      #
      # 0.basic.patch
      # 1.64bit.patch
      # 2.hist.patch
      # 3.abs_url.patch

      patch -p0 < 0.basic.patch

      =-=

      chongo (http://www.isthe.com/chongo/ -- Landon Curt Noll) /\oo/\
      Share and enjoy! :-)
    Your message has been successfully submitted and would be delivered to recipients shortly.