Loading ...
Sorry, an error occurred while loading the content.
 

Browsers, Search Engines and Phrases -- where?

Expand Messages
  • jlyandco
    PLEASE HELP AN IDIOT! I know that I ve asked this -- and been told -- but I STILL cannot get any data collected/reported on Browsers used, Search Engines used,
    Message 1 of 7 , Apr 5, 2004
      PLEASE HELP AN IDIOT!

      I know that I've asked this -- and been told -- but I STILL cannot get
      any data collected/reported on Browsers used, Search Engines used, or
      the Word/Phrase they used to find our site. Let me show you everything
      I have, and I will also gladly answer ANY questions that you may have
      for me.

      I'm using my server to host a number of "virtual" domains, so I have
      the following types of logs in my /logs directory:

      access_log (one file; appears to be the access log for my "primary"
      domain; currently active);

      agent_log (one file; appears to be the error log for my "primary"
      domain; currently active);

      referer_log (one file; does not appear to have had any activity since
      last summer... possibly when I "fixed(?)" my previous reporting
      errors!)

      virtualhostdomainname-access_log (many files.. I appear to have an
      access log for each "virtual" domain; currently active); and

      virtualhostdomainname-error_log (many files.. I appear to have an
      error log for each "virtual" domain; currently active)


      I am including the Log section of my httpd.conf file (I read here
      somewhere that I need to add a "%q" -- but I'm not exactly sure where
      to add it!).

      I'm also adding my webalizer.conf, as well.

      PLEASE help me find what I am obviously missing!!!

      Thanks,
      Jeff
      ================
      ================
      From httpd.conf:

      HostnameLookups On
      #
      ErrorLog logs/error_log
      #
      # LogLevel: Control the number of messages logged to the error_log.
      # Possible values include: debug, info, notice, warn, error, crit,
      # alert, emerg.
      #
      LogLevel warn
      #
      # The following directives define some format nicknames for use with
      # a CustomLog directive (see below).
      #
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
      \"%{User-Agent}i\"" combined
      LogFormat "%h %l %u %t \"%r\" %>s %b" common
      LogFormat "%{Referer}i -> %U" referer
      LogFormat "%{User-agent}i" agent
      #
      # The location and format of the access logfile (Common Logfile
      Format).
      # If you do not define any access logfiles within a <VirtualHost>
      # container, they will be logged here. Contrariwise, if you *do*
      # define per-<VirtualHost> access logfiles, transactions will be
      # logged therein and *not* in this file.
      #
      #CustomLog /usr/local/etc/httpd/logs/access_log common
      #
      # If you would like to have agent and referer logfiles, uncomment the
      # following directives.
      #
      #CustomLog /usr/local/etc/httpd/logs/referer_log referer
      #CustomLog /usr/local/etc/httpd/logs/agent_log agent
      #
      # If you prefer a single logfile with access, agent, and referer
      information
      # (Combined Logfile Format) you can use the following directive.
      #
      CustomLog /usr/local/etc/httpd/logs/access_log combined

      ============
      ============
      From webalizer.conf

      AllSites yes
      AllURLs yes
      AllReferrers yes
      AllSearchStr yes
      CountryGraph yes
      GMTTime no
      GraphLegend yes
      GraphLines 2
      #GroupAgent
      GroupHighlight yes
      #GroupReferrer
      GroupShading yes
      #GroupSite
      #GroupURL
      #HideAgent
      HideReferrer *timberlinechurch.org/
      HideReferrer timberlinechurch.org/cpanel*
      HideReferrer *timberlinechurch.org/html/
      HideReferrer timberlinechurch.org/webalizer*
      HideSite *timberlinechurch.org
      HideSite localhost
      HideURL */cpanel
      HideURL */images
      HideURL *.gif
      HideURL *.GIF
      HideURL *.jpg
      HideURL *.JPG
      HideURL *.png
      HideURL *.PNG
      HideURL *.ra
      HideUser cpanel
      HideUser timberli
      HourlyGraph yes
      HourlyStats yes
      #IgnoreAgent
      #IgnoreReferrer
      #IgnoreSite
      #IgnoreURL
      #IncludeAgent
      #IncludeReferrer
      #IncludeSite
      #IncludeURL
      IndexAlias index
      LogFile /www/logs/timberlinechurch.org-access_log
      MangleAgents DEFAULT
      SearchEngine yahoo.com p=
      SearchEngine altavista.com q=
      SearchEngine google.com q=
      SearchEngine eureka.com q=
      SearchEngine lycos.com query=
      SearchEngine hotbot.com MT=
      SearchEngine msn.com MT=
      SearchEngine infoseek.com qt=
      SearchEngine webcrawler searchText=
      SearchEngine excite search=
      SearchEngine netscape.com search=
      SearchEngine mamma.com query=
      SearchEngine alltheweb.com query=
      SearchEngine northernlight.com qr=
      PageType *.htm
      PageType *.html
      PageType *,.cgi
      ReportTitle Usage Statistics for
      TopAgents 15
      TopCountries 50
      TopEntry 10
    • enventa2000
      There is not info enough. It looks like you are using a common formatted log, not a combined one. This would explain the missing info, since common logs
      Message 2 of 7 , Apr 6, 2004
        There is not info enough.

        It looks like you are using a "common" formatted log, not a "combined"
        one. This would explain the missing info, since "common" logs don't
        include it, and webalizer would never get to actually see the info in
        order to process it.

        Please post two or three sample lines of the log file mentioned in
        your webalizer.conf.

        Please post also some lines from one of your
        "virtualhostdomainname-access_log" files you have in your logs
        directory, so we can check the logs format.

        When you have solved this, maybe you want to copy&paste an updated
        SearchEngine list in your webalizer.conf to achieve more accurate
        results (shameless self-promotion):

        http://griho.udl.es/webalizer/webalizer.conf.txt


        (Also, I would comment the MangleAgents line. I don't believe DEFAULT
        is a valid value. To have the default value, just comment the line.)

        (Also, you seem to have a typo. It should say "PageType *.cgi",
        whithout the comma ",")

        (Also, you can comment IndexAlias line, or put "index." instead of
        "index")

        --- In webalizer@yahoogroups.com, "jlyandco" <jlyandco@y...> wrote:
        > PLEASE HELP AN IDIOT!
        >
        > I know that I've asked this -- and been told -- but I STILL cannot
        get
        > any data collected/reported on Browsers used, Search Engines used,
        or
        > the Word/Phrase they used to find our site. Let me show you
        everything
        > I have, and I will also gladly answer ANY questions that you may
        have
        > for me.
        >
        > I'm using my server to host a number of "virtual" domains, so I have
        > the following types of logs in my /logs directory:
        >
        > access_log (one file; appears to be the access log for my "primary"
        > domain; currently active);
        >
        > agent_log (one file; appears to be the error log for my "primary"
        > domain; currently active);
        >
        > referer_log (one file; does not appear to have had any activity
        since
        > last summer... possibly when I "fixed(?)" my previous reporting
        > errors!)
        >
        > virtualhostdomainname-access_log (many files.. I appear to have an
        > access log for each "virtual" domain; currently active); and
        >
        > virtualhostdomainname-error_log (many files.. I appear to have an
        > error log for each "virtual" domain; currently active)
        >
        >
        > I am including the Log section of my httpd.conf file (I read here
        > somewhere that I need to add a "%q" -- but I'm not exactly sure
        where
        > to add it!).
        >
        > I'm also adding my webalizer.conf, as well.
        >
        > PLEASE help me find what I am obviously missing!!!
        >
        > Thanks,
        > Jeff
        > ================
        > ================
        > From httpd.conf:
        >
        > HostnameLookups On
        > #
        > ErrorLog logs/error_log
        > #
        > # LogLevel: Control the number of messages logged to the error_log.
        > # Possible values include: debug, info, notice, warn, error, crit,
        > # alert, emerg.
        > #
        > LogLevel warn
        > #
        > # The following directives define some format nicknames for use with
        > # a CustomLog directive (see below).
        > #
        > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
        > \"%{User-Agent}i\"" combined
        > LogFormat "%h %l %u %t \"%r\" %>s %b" common
        > LogFormat "%{Referer}i -> %U" referer
        > LogFormat "%{User-agent}i" agent
        > #
        > # The location and format of the access logfile (Common Logfile
        > Format).
        > # If you do not define any access logfiles within a <VirtualHost>
        > # container, they will be logged here. Contrariwise, if you *do*
        > # define per-<VirtualHost> access logfiles, transactions will be
        > # logged therein and *not* in this file.
        > #
        > #CustomLog /usr/local/etc/httpd/logs/access_log common
        > #
        > # If you would like to have agent and referer logfiles, uncomment
        the
        > # following directives.
        > #
        > #CustomLog /usr/local/etc/httpd/logs/referer_log referer
        > #CustomLog /usr/local/etc/httpd/logs/agent_log agent
        > #
        > # If you prefer a single logfile with access, agent, and referer
        > information
        > # (Combined Logfile Format) you can use the following directive.
        > #
        > CustomLog /usr/local/etc/httpd/logs/access_log combined
        >
        > ============
        > ============
        > From webalizer.conf
        >
        > AllSites yes
        > AllURLs yes
        > AllReferrers yes
        > AllSearchStr yes
        > CountryGraph yes
        > GMTTime no
        > GraphLegend yes
        > GraphLines 2
        > #GroupAgent
        > GroupHighlight yes
        > #GroupReferrer
        > GroupShading yes
        > #GroupSite
        > #GroupURL
        > #HideAgent
        > HideReferrer *timberlinechurch.org/
        > HideReferrer timberlinechurch.org/cpanel*
        > HideReferrer *timberlinechurch.org/html/
        > HideReferrer timberlinechurch.org/webalizer*
        > HideSite *timberlinechurch.org
        > HideSite localhost
        > HideURL */cpanel
        > HideURL */images
        > HideURL *.gif
        > HideURL *.GIF
        > HideURL *.jpg
        > HideURL *.JPG
        > HideURL *.png
        > HideURL *.PNG
        > HideURL *.ra
        > HideUser cpanel
        > HideUser timberli
        > HourlyGraph yes
        > HourlyStats yes
        > #IgnoreAgent
        > #IgnoreReferrer
        > #IgnoreSite
        > #IgnoreURL
        > #IncludeAgent
        > #IncludeReferrer
        > #IncludeSite
        > #IncludeURL
        > IndexAlias index
        > LogFile /www/logs/timberlinechurch.org-access_log
        > MangleAgents DEFAULT
        > SearchEngine yahoo.com p=
        > SearchEngine altavista.com q=
        > SearchEngine google.com q=
        > SearchEngine eureka.com q=
        > SearchEngine lycos.com query=
        > SearchEngine hotbot.com MT=
        > SearchEngine msn.com MT=
        > SearchEngine infoseek.com qt=
        > SearchEngine webcrawler searchText=
        > SearchEngine excite search=
        > SearchEngine netscape.com search=
        > SearchEngine mamma.com query=
        > SearchEngine alltheweb.com query=
        > SearchEngine northernlight.com qr=
        > PageType *.htm
        > PageType *.html
        > PageType *,.cgi
        > ReportTitle Usage Statistics for
        > TopAgents 15
        > TopCountries 50
        > TopEntry 10
      • jlyandco
        ... These are from timberlinechurch.org-access_log (one of the many virtual domains I meant by virtualhostdomainname-access_log ):
        Message 3 of 7 , Apr 7, 2004
          --- In webalizer@yahoogroups.com, "enventa2000" <enventa2000@y...> wrote:
          > There is not info enough.
          >
          > It looks like you are using a "common" formatted log, not a "combined"
          > one. This would explain the missing info, since "common" logs don't
          > include it, and webalizer would never get to actually see the info in
          > order to process it.
          >
          > Please post two or three sample lines of the log file mentioned in
          > your webalizer.conf.
          >
          > Please post also some lines from one of your
          > "virtualhostdomainname-access_log" files you have in your logs
          > directory, so we can check the logs format.


          These are from "timberlinechurch.org-access_log" (one of the many
          "virtual domains" I meant by "virtualhostdomainname-access_log"):
          12-252-40-5.client.attbi.com - - [24/Apr/2003:03:32:00 -0400] "GET
          /html/outreach/over.html HTTP/1.1" 200 9725
          213.78.109.69 - - [24/Apr/2003:04:21:57 -0400] "GET / HTTP/1.1" 200
          480ac91547b.ipt.aol.com - - [24/Apr/2003:08:29:17 -0400] "GET
          /html/word/042203.html HTTP/1.1" 200 13793
          1cust220.tnt1.fort-collins.co.da.uu.net - - [24/Apr/2003:08:31:18
          -0400] "GET / HTTP/1.1" 200 480



          The log reports from my "primary"? log (access_log) does appear
          different, and that moreinfo is collected there:
          squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.gif
          HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
          (compatible; MSIE 6.0; Windows NT 5.0)"
          squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/back.gif
          HTTP/1.0" 200 216 "http://www.the-print-broker.com/" "Mozilla/4.0
          (compatible; MSIE 6.0; Windows NT 5.0)"
          h00022d591170.ne.client2.attbi.com - - [01/Aug/2003:18:59:11 -0400]
          "GET / HTTP/1.1" 200 2471 "-" "Mozilla/5.0 (Windows; U; Windows NT
          5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)"
          h00022d591170.ne.client2.attbi.com - - [01/Aug/2003:19:08:11 -0400]
          "GET / HTTP/1.1" 200 2471 "-" "Mozilla/5.0 (Windows; U; Windows NT
          5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)"




          My "referrer.log" looks like it started to record, but then stopped:
          http://search.dogpile.com/texis/search?method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
          -> /
          http://www.hai-colo.com/ -> /images/logo.jpg
          http://search.dogpile.com/texis/search?method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
          -> /
          http://www.hai-colo.com/ -> /images//log_analysis_screen_info.gif
          http://search.dogpile.com/texis/search?method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
          -> /
          http://www.hai-colo.com/ -> /images//log_analysis_screen_info.gif
          referer
          referer
          referer
          (and on and on for a long time)



          My "agent_log" does the same thing:
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
          agent
          agent
          agent
          (and on and on for a long time)


          The differences between these logs are still pretty Greek to me... if
          you could enlighten me, I'd greatly appreciate it!

          Please let me know if I can provide anything else to help in your
          "diagnosis"...



          > When you have solved this, maybe you want to copy&paste an updated
          > SearchEngine list in your webalizer.conf to achieve more accurate
          > results (shameless self-promotion):
          >
          > http://griho.udl.es/webalizer/webalizer.conf.txt

          NOTHING wrong with promoting yourself...



          > (Also, I would comment the MangleAgents line. I don't believe DEFAULT
          > is a valid value. To have the default value, just comment the line.)
          >
          > (Also, you seem to have a typo. It should say "PageType *.cgi",
          > whithout the comma ",")
          >
          > (Also, you can comment IndexAlias line, or put "index." instead of
          > "index")
          >

          Thanks... I'll correct these, as well!

          I REALLY appreciate your help.... I've been "winging it" for awhile,
          and I'd like to do better. Thanks for the mentoring!

          Regards,
          Jeff
          =======================================================
          =======================================================
          > --- In webalizer@yahoogroups.com, "jlyandco" <jlyandco@y...> wrote:
          > > PLEASE HELP AN IDIOT!
          > >
          > > I know that I've asked this -- and been told -- but I STILL cannot
          > get
          > > any data collected/reported on Browsers used, Search Engines used,
          > or
          > > the Word/Phrase they used to find our site. Let me show you
          > everything
          > > I have, and I will also gladly answer ANY questions that you may
          > have
          > > for me.
          > >
          > > I'm using my server to host a number of "virtual" domains, so I have
          > > the following types of logs in my /logs directory:
          > >
          > > access_log (one file; appears to be the access log for my "primary"
          > > domain; currently active);
          > >
          > > agent_log (one file; appears to be the error log for my "primary"
          > > domain; currently active);
          > >
          > > referer_log (one file; does not appear to have had any activity
          > since
          > > last summer... possibly when I "fixed(?)" my previous reporting
          > > errors!)
          > >
          > > virtualhostdomainname-access_log (many files.. I appear to have an
          > > access log for each "virtual" domain; currently active); and
          > >
          > > virtualhostdomainname-error_log (many files.. I appear to have an
          > > error log for each "virtual" domain; currently active)
          > >
          > >
          > > I am including the Log section of my httpd.conf file (I read here
          > > somewhere that I need to add a "%q" -- but I'm not exactly sure
          > where
          > > to add it!).
          > >
          > > I'm also adding my webalizer.conf, as well.
          > >
          > > PLEASE help me find what I am obviously missing!!!
          > >
          > > Thanks,
          > > Jeff
          > > ================
          > > ================
          > > From httpd.conf:
          > >
          > > HostnameLookups On
          > > #
          > > ErrorLog logs/error_log
          > > #
          > > # LogLevel: Control the number of messages logged to the error_log.
          > > # Possible values include: debug, info, notice, warn, error, crit,
          > > # alert, emerg.
          > > #
          > > LogLevel warn
          > > #
          > > # The following directives define some format nicknames for use with
          > > # a CustomLog directive (see below).
          > > #
          > > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
          > > \"%{User-Agent}i\"" combined
          > > LogFormat "%h %l %u %t \"%r\" %>s %b" common
          > > LogFormat "%{Referer}i -> %U" referer
          > > LogFormat "%{User-agent}i" agent
          > > #
          > > # The location and format of the access logfile (Common Logfile
          > > Format).
          > > # If you do not define any access logfiles within a <VirtualHost>
          > > # container, they will be logged here. Contrariwise, if you *do*
          > > # define per-<VirtualHost> access logfiles, transactions will be
          > > # logged therein and *not* in this file.
          > > #
          > > #CustomLog /usr/local/etc/httpd/logs/access_log common
          > > #
          > > # If you would like to have agent and referer logfiles, uncomment
          > the
          > > # following directives.
          > > #
          > > #CustomLog /usr/local/etc/httpd/logs/referer_log referer
          > > #CustomLog /usr/local/etc/httpd/logs/agent_log agent
          > > #
          > > # If you prefer a single logfile with access, agent, and referer
          > > information
          > > # (Combined Logfile Format) you can use the following directive.
          > > #
          > > CustomLog /usr/local/etc/httpd/logs/access_log combined
          > >
          > > ============
          > > ============
          > > From webalizer.conf
          > >
          > > AllSites yes
          > > AllURLs yes
          > > AllReferrers yes
          > > AllSearchStr yes
          > > CountryGraph yes
          > > GMTTime no
          > > GraphLegend yes
          > > GraphLines 2
          > > #GroupAgent
          > > GroupHighlight yes
          > > #GroupReferrer
          > > GroupShading yes
          > > #GroupSite
          > > #GroupURL
          > > #HideAgent
          > > HideReferrer *timberlinechurch.org/
          > > HideReferrer timberlinechurch.org/cpanel*
          > > HideReferrer *timberlinechurch.org/html/
          > > HideReferrer timberlinechurch.org/webalizer*
          > > HideSite *timberlinechurch.org
          > > HideSite localhost
          > > HideURL */cpanel
          > > HideURL */images
          > > HideURL *.gif
          > > HideURL *.GIF
          > > HideURL *.jpg
          > > HideURL *.JPG
          > > HideURL *.png
          > > HideURL *.PNG
          > > HideURL *.ra
          > > HideUser cpanel
          > > HideUser timberli
          > > HourlyGraph yes
          > > HourlyStats yes
          > > #IgnoreAgent
          > > #IgnoreReferrer
          > > #IgnoreSite
          > > #IgnoreURL
          > > #IncludeAgent
          > > #IncludeReferrer
          > > #IncludeSite
          > > #IncludeURL
          > > IndexAlias index
          > > LogFile /www/logs/timberlinechurch.org-access_log
          > > MangleAgents DEFAULT
          > > SearchEngine yahoo.com p=
          > > SearchEngine altavista.com q=
          > > SearchEngine google.com q=
          > > SearchEngine eureka.com q=
          > > SearchEngine lycos.com query=
          > > SearchEngine hotbot.com MT=
          > > SearchEngine msn.com MT=
          > > SearchEngine infoseek.com qt=
          > > SearchEngine webcrawler searchText=
          > > SearchEngine excite search=
          > > SearchEngine netscape.com search=
          > > SearchEngine mamma.com query=
          > > SearchEngine alltheweb.com query=
          > > SearchEngine northernlight.com qr=
          > > PageType *.htm
          > > PageType *.html
          > > PageType *,.cgi
          > > ReportTitle Usage Statistics for
          > > TopAgents 15
          > > TopCountries 50
          > > TopEntry 10
        • enventa2000
          This a LONG explanation, but worth reading it, I promise. You ll gain insight, and you may later impress other people :) Ok, here you are just defining the
          Message 4 of 7 , Apr 10, 2004
            This a LONG explanation, but worth reading it, I promise. You'll gain
            insight, and you may later impress other people :)


            Ok, here you are just defining the information appearing in each log
            format. This is right:

            For the "combined" format:

            > > > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
            > > > \"%{User-Agent}i\"" combined

            For the "common" format:

            > > > LogFormat "%h %l %u %t \"%r\" %>s %b" common

            And for the "referer" and "agent ones"

            > > > LogFormat "%{Referer}i -> %U" referer
            > > > LogFormat "%{User-agent}i" agent

            Here, you see? You have commented the lines defining your referer and
            agent logs, that's why they stopped being logged. You commented the
            line that instructs Apache to actually produce them. Just delete the
            "#" at the start of the line, and reload or restart the apache server:

            > > > #CustomLog /usr/local/etc/httpd/logs/referer_log referer
            > > > #CustomLog /usr/local/etc/httpd/logs/agent_log agent


            Here we have the main log file, defined as a "combined" log. This is
            right.

            > > > CustomLog /usr/local/etc/httpd/logs/access_log combined


            Now, the main problem. Go to your http.conf file (make sure it is
            actually the httpd.conf used by apache, and not some copy in other
            directory). Search for "VirtualHost". You should find a sample
            VirtalHost container (if you didn't delete it, of course):

            #<VirtualHost *>
            # ServerAdmin webmaster@...
            # DocumentRoot /www/docs/dummy-host.example.com
            # ServerName dummy-host.example.com
            # ErrorLog logs/dummy-host.example.com-error_log
            # CustomLog logs/dummy-host.example.com-access_log common
            #</VirtualHost>

            You see the "*"? This means that this applies to every virtual host in
            the server. Swap it for, say "printer-broker", and the info there will
            only apply to the "printer-broker" subdomain.

            YOU NEED TO CHANGE THE CUSTOMLOG DIRECTIVE. If you just uncommented
            this, then you have just told apache to log every subdomain in a file
            called dummy-host.example.com-access_log, WHICH IS IN COMMON FORMAT,
            THAT'S IT, NO REFERRER AND NO AGENT.

            By our description, I guess you have made a different VirtualHost
            container for each subdomain, so you need to go over ALL of them one
            by one, changing in each one the format from "common" to "combined".

            After doing this, save the file and reload or restart apache so it
            starts logging in the new format. Notice that this will cause the logs
            to have some entries in common format or some entries in combined
            format, causing potencial source of problems, like webalizer uncapable
            to guess if the log is in commom format or in combined format, perhaps
            causing it to believe it is common format and dropping altogether the
            referrer and agent info.

            To prevent this, you could stop the server, rotate the logs and then
            restart the server. Should take less than one minute! Now some files
            will be in common format and others will be in combined format, but
            this a different problem.

            Please post here if you could solve it, and what happened later. You
            could also post one of your VirtualHost containers before you change
            it, so to see if that was the problem. You could change the names on
            it to avoid privacy problems.




            LONG INFO ABOUT THE "LOOKS LIKE GREEK TO ME" LOG FILES :)


            [...]
            > These are from "timberlinechurch.org-access_log" (one of the many
            > "virtual domains" I meant by "virtualhostdomainname-access_log"):

            > 12-252-40-5.client.attbi.com - - [24/Apr/2003:03:32:00 -0400] "GET
            > /html/outreach/over.html HTTP/1.1" 200 9725
            [...]


            This is "common" format log, it lacks the referrer and the agent at
            the end of each line. Look below.




            > The log reports from my "primary"? log (access_log) does appear
            > different, and that moreinfo is collected there:

            > squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.
            gif
            > HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
            > (compatible; MSIE 6.0; Windows NT 5.0)"
            [...]


            Visiter IP: squid1.gvea.com
            Filler: - -
            Date: [01/Aug/2003:17:57:18 -0400]
            Request type: GET
            URL requested: /icons/blank.gif
            HTTP version: HTTP/1.1
            Response code: 200
            Response bytes: 148
            Referrer: http://www.the-print-broker.com/
            Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)


            This visitor was visiting your "the-printer-broker.com" website, and
            its Internet Explorer browser (version 6.0) was retrieving the blank.
            gif image, in order to render all images in the page. If you look at
            the context, you'll see how from the same IP (or resolved address)
            there is first a request for the page, and then later, successive,
            very fast requests for all elements on that page.

            BOUNDARIES

            Notice how date is enclosed by [], and how GET, page request and HTTP
            version are all enclosed together by "". This allows programs like
            webalizer to identify the data. Referrer and agent are each one
            enclosed by "" (they are enclosed mainly because they may have blank
            spaces in the middle of it, which causes confusion to program parsers,
            which tend to believe that blank spaces always separate fields).

            REQUEST TYPES:

            GET most common one. can send information inputted in forms. You may
            see spam relay probes like "GET /cgi-bin/formmail.pl?
            email=fake_name@...&recipient=throw_away@...&subject=www.
            griho.udl.es/cgi-bin/formmail.pl HTTP/1.0"
            The spammer will later check the throw_away address to see if the
            probe went throught, and then relay the heck out of your server. See
            how the subject itself has the probe address, so the spammer doesn't
            even need to check the body of the message. Use:
            "grep /var/log/access_log* -e formmail" to find them.

            POST used for sending information from forms. more secure than "GET"
            to send info from forms

            CONNECT probably a attempt of proxyconnection. most surely a spammer
            trying to relay spam, or a hacker looking for free proxies. I see a
            lot: "CONNECT IP:PORT HTTP/1.0" Make sure the server is answering 505
            method not allowed! If it answers some code saying "accepted" or
            something similar, then your server maybe is being used to relay spam!
            Try "grep /var/log/access_log* -e CONNECT" in the command line to find
            this abuse.

            DELETE I don't know how this is supossed to work, but it sure looks
            like a bad thing to allow to visitors

            OPTIONS website publisher programs (I don't know what this is for).



            RESPONSE CODES

            Most common Response codes:

            200 "OK"

            304 "Not changed" (the browser asks wheter it needs to refresh this
            file in his cache, and the server answers "304, no it hasn't changed,
            just use your cached version instead of pulling the file again")
            (normally you see 304 0 because the server sent just a response code,
            so it has sent zero bytes of data)

            404 "Not Found"

            405 "Method not allowed" (usually answers to spam abuse CONNECT and
            also to PROPFIND from publisher programs like webDAV, also appears to
            POST if you forbid the method by accident or on purpose)

            500 "Server error" Your server is having problems.


            These response codes appear when you move things around and tell the
            server to redirect people to the new URL instead of sending 404.

            301 "moved permanently"

            302 "moved temporaly"

            307 "temporary redirect"



            Here you have the response code list from the Apache 2 source code
            (about line #431) and list of request types (about line #516). There
            are lots of unused codes you don't need to worry about. It's just the
            usual over-engineering in internet protocols. Throwing features in
            "just in case".

            http://lxr.webperf.org/source.cgi/include/httpd.h





            > My "referrer.log" looks like it started to record, but then stopped:
            > http://search.dogpile.com/texis/search?
            method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
            > -> /
            [...]

            Ok, the referrer format is:

            http://server/page/user/comes/from -> /page_requested_in_your_server

            When the visiter has typed the address directly, or has copy&pasted
            it, or has clicked on his favorites, it is a direct request:

            - -> /page_requested_in_your_server

            This is a direct request to http://my_domain.com/ (perhaps someone
            trying your company name in the address bar? perhaps someone clicking
            in My Favorites?)

            - -> /

            This is a direct request to http://my_domain.com/articles/new.html

            - -> /articles/new.html






            > My "agent_log" does the same thing:
            > Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
            [...]

            This one is trivial. "Program/version (details)". I know the details
            are in a certain order, but I don't know about it. The MangleAgents
            keyword in webalizer.conf can be tuned to cut slack in the detail
            level.
          • enventa2000
            Ahem, I made a small mistake, I said that there is filler in the log, but it is instead the remote logname and the remote user. We don t use that on our
            Message 5 of 7 , Apr 10, 2004
              Ahem, I made a small mistake, I said that there is "filler" in the
              log, but it is instead the remote logname and the remote user. We
              don't use that on our server, so every entrie in our logs has "- -"
              instead of user names because there are no users.

              Here there is complete information about how to configure the
              LogFormat directive and all the "%" thingies.

              http://httpd.apache.org/docs-2.0/mod/mod_log_config.html



              So, this line would be:

              squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.gif
              HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
              (compatible; MSIE 6.0; Windows NT 5.0)"


              Visiter IP: squid1.gvea.com
              Remote logname (identd): -
              Remote User (auth): -
              Date: [01/Aug/2003:17:57:18 -0400]
              Request type: GET
              URL requested: /icons/blank.gif
              HTTP version: HTTP/1.1
              Response code: 200
              Response bytes: 148
              Referrer: http://www.the-print-broker.com/
              Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows
              NT 5.0)

              I'm sure that I am missing more things, but you may desire to check
              the list in the link for better info and compare to your LogFormat
              lines. You can make more customisation to the logs than you need to.
            • jlyandco
              May I ask you a question regarding the Virtual Host containers? In my httpd.conf file, here is an example of what one of these animals looks like:
              Message 6 of 7 , Apr 19, 2004
                May I ask you a question regarding the "Virtual Host" containers?

                In my httpd.conf file, here is an example of what one of these animals
                looks like:

                <VIRTUALHOST 216.117.171.112:80>
                ServerName wheelchairhouseco.com
                ServerAlias wheelchairhouseco.com www.wheelchairhouseco.com
                ServerAdmin junglrot@...
                DocumentRoot /usr/local/etc/httpd/htdocs/wheelchairhouseco
                ErrorLog logs/wheelchairhouseco.com-error_log
                TransferLog logs/wheelchairhouseco.com-access_log
                </VIRTUALHOST>

                There is NO reference in this "container" to either "common" or
                "combined" log -- as you show in your attached (and wonderfully
                insightful) explanation.

                Is this something that I can modify (to reflect as your example shows)
                and it will begin capturing all of the access, error, agent and
                referral information? Is there something else that I need to do?

                (This is still pretty new to me, but I want to know what I'm doing --
                because I want to do this right!)

                Thanks for you help! Let me know if I can provide you with anything else!

                Regards,
                Jeff
                =============================================






                --- In webalizer@yahoogroups.com, "enventa2000" <enventa2000@y...> wrote:
                > This a LONG explanation, but worth reading it, I promise. You'll gain
                > insight, and you may later impress other people :)
                >
                >
                > Ok, here you are just defining the information appearing in each log
                > format. This is right:
                >
                > For the "combined" format:
                >
                > > > > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
                > > > > \"%{User-Agent}i\"" combined
                >
                > For the "common" format:
                >
                > > > > LogFormat "%h %l %u %t \"%r\" %>s %b" common
                >
                > And for the "referer" and "agent ones"
                >
                > > > > LogFormat "%{Referer}i -> %U" referer
                > > > > LogFormat "%{User-agent}i" agent
                >
                > Here, you see? You have commented the lines defining your referer and
                > agent logs, that's why they stopped being logged. You commented the
                > line that instructs Apache to actually produce them. Just delete the
                > "#" at the start of the line, and reload or restart the apache server:
                >
                > > > > #CustomLog /usr/local/etc/httpd/logs/referer_log referer
                > > > > #CustomLog /usr/local/etc/httpd/logs/agent_log agent
                >
                >
                > Here we have the main log file, defined as a "combined" log. This is
                > right.
                >
                > > > > CustomLog /usr/local/etc/httpd/logs/access_log combined
                >
                >
                > Now, the main problem. Go to your http.conf file (make sure it is
                > actually the httpd.conf used by apache, and not some copy in other
                > directory). Search for "VirtualHost". You should find a sample
                > VirtalHost container (if you didn't delete it, of course):
                >
                > #<VirtualHost *>
                > # ServerAdmin webmaster@d...
                > # DocumentRoot /www/docs/dummy-host.example.com
                > # ServerName dummy-host.example.com
                > # ErrorLog logs/dummy-host.example.com-error_log
                > # CustomLog logs/dummy-host.example.com-access_log common
                > #</VirtualHost>
                >
                > You see the "*"? This means that this applies to every virtual host in
                > the server. Swap it for, say "printer-broker", and the info there will
                > only apply to the "printer-broker" subdomain.
                >
                > YOU NEED TO CHANGE THE CUSTOMLOG DIRECTIVE. If you just uncommented
                > this, then you have just told apache to log every subdomain in a file
                > called dummy-host.example.com-access_log, WHICH IS IN COMMON FORMAT,
                > THAT'S IT, NO REFERRER AND NO AGENT.
                >
                > By our description, I guess you have made a different VirtualHost
                > container for each subdomain, so you need to go over ALL of them one
                > by one, changing in each one the format from "common" to "combined".
                >
                > After doing this, save the file and reload or restart apache so it
                > starts logging in the new format. Notice that this will cause the logs
                > to have some entries in common format or some entries in combined
                > format, causing potencial source of problems, like webalizer uncapable
                > to guess if the log is in commom format or in combined format, perhaps
                > causing it to believe it is common format and dropping altogether the
                > referrer and agent info.
                >
                > To prevent this, you could stop the server, rotate the logs and then
                > restart the server. Should take less than one minute! Now some files
                > will be in common format and others will be in combined format, but
                > this a different problem.
                >
                > Please post here if you could solve it, and what happened later. You
                > could also post one of your VirtualHost containers before you change
                > it, so to see if that was the problem. You could change the names on
                > it to avoid privacy problems.
                >
                >
                >
                >
                > LONG INFO ABOUT THE "LOOKS LIKE GREEK TO ME" LOG FILES :)
                >
                >
                > [...]
                > > These are from "timberlinechurch.org-access_log" (one of the many
                > > "virtual domains" I meant by "virtualhostdomainname-access_log"):
                >
                > > 12-252-40-5.client.attbi.com - - [24/Apr/2003:03:32:00 -0400] "GET
                > > /html/outreach/over.html HTTP/1.1" 200 9725
                > [...]
                >
                >
                > This is "common" format log, it lacks the referrer and the agent at
                > the end of each line. Look below.
                >
                >
                >
                >
                > > The log reports from my "primary"? log (access_log) does appear
                > > different, and that moreinfo is collected there:
                >
                > > squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.
                > gif
                > > HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
                > > (compatible; MSIE 6.0; Windows NT 5.0)"
                > [...]
                >
                >
                > Visiter IP: squid1.gvea.com
                > Filler: - -
                > Date: [01/Aug/2003:17:57:18 -0400]
                > Request type: GET
                > URL requested: /icons/blank.gif
                > HTTP version: HTTP/1.1
                > Response code: 200
                > Response bytes: 148
                > Referrer: http://www.the-print-broker.com/
                > Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
                >
                >
                > This visitor was visiting your "the-printer-broker.com" website, and
                > its Internet Explorer browser (version 6.0) was retrieving the blank.
                > gif image, in order to render all images in the page. If you look at
                > the context, you'll see how from the same IP (or resolved address)
                > there is first a request for the page, and then later, successive,
                > very fast requests for all elements on that page.
                >
                > BOUNDARIES
                >
                > Notice how date is enclosed by [], and how GET, page request and HTTP
                > version are all enclosed together by "". This allows programs like
                > webalizer to identify the data. Referrer and agent are each one
                > enclosed by "" (they are enclosed mainly because they may have blank
                > spaces in the middle of it, which causes confusion to program parsers,
                > which tend to believe that blank spaces always separate fields).
                >
                > REQUEST TYPES:
                >
                > GET most common one. can send information inputted in forms. You may
                > see spam relay probes like "GET /cgi-bin/formmail.pl?
                > email=fake_name@m...&recipient=throw_away@h...&subject=www.
                > griho.udl.es/cgi-bin/formmail.pl HTTP/1.0"
                > The spammer will later check the throw_away address to see if the
                > probe went throught, and then relay the heck out of your server. See
                > how the subject itself has the probe address, so the spammer doesn't
                > even need to check the body of the message. Use:
                > "grep /var/log/access_log* -e formmail" to find them.
                >
                > POST used for sending information from forms. more secure than "GET"
                > to send info from forms
                >
                > CONNECT probably a attempt of proxyconnection. most surely a spammer
                > trying to relay spam, or a hacker looking for free proxies. I see a
                > lot: "CONNECT IP:PORT HTTP/1.0" Make sure the server is answering 505
                > method not allowed! If it answers some code saying "accepted" or
                > something similar, then your server maybe is being used to relay spam!
                > Try "grep /var/log/access_log* -e CONNECT" in the command line to find
                > this abuse.
                >
                > DELETE I don't know how this is supossed to work, but it sure looks
                > like a bad thing to allow to visitors
                >
                > OPTIONS website publisher programs (I don't know what this is for).
                >
                >
                >
                > RESPONSE CODES
                >
                > Most common Response codes:
                >
                > 200 "OK"
                >
                > 304 "Not changed" (the browser asks wheter it needs to refresh this
                > file in his cache, and the server answers "304, no it hasn't changed,
                > just use your cached version instead of pulling the file again")
                > (normally you see 304 0 because the server sent just a response code,
                > so it has sent zero bytes of data)
                >
                > 404 "Not Found"
                >
                > 405 "Method not allowed" (usually answers to spam abuse CONNECT and
                > also to PROPFIND from publisher programs like webDAV, also appears to
                > POST if you forbid the method by accident or on purpose)
                >
                > 500 "Server error" Your server is having problems.
                >
                >
                > These response codes appear when you move things around and tell the
                > server to redirect people to the new URL instead of sending 404.
                >
                > 301 "moved permanently"
                >
                > 302 "moved temporaly"
                >
                > 307 "temporary redirect"
                >
                >
                >
                > Here you have the response code list from the Apache 2 source code
                > (about line #431) and list of request types (about line #516). There
                > are lots of unused codes you don't need to worry about. It's just the
                > usual over-engineering in internet protocols. Throwing features in
                > "just in case".
                >
                > http://lxr.webperf.org/source.cgi/include/httpd.h
                >
                >
                >
                >
                >
                > > My "referrer.log" looks like it started to record, but then stopped:
                > > http://search.dogpile.com/texis/search?
                > method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
                > > -> /
                > [...]
                >
                > Ok, the referrer format is:
                >
                > http://server/page/user/comes/from -> /page_requested_in_your_server
                >
                > When the visiter has typed the address directly, or has copy&pasted
                > it, or has clicked on his favorites, it is a direct request:
                >
                > - -> /page_requested_in_your_server
                >
                > This is a direct request to http://my_domain.com/ (perhaps someone
                > trying your company name in the address bar? perhaps someone clicking
                > in My Favorites?)
                >
                > - -> /
                >
                > This is a direct request to http://my_domain.com/articles/new.html
                >
                > - -> /articles/new.html
                >
                >
                >
                >
                >
                >
                > > My "agent_log" does the same thing:
                > > Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
                > [...]
                >
                > This one is trivial. "Program/version (details)". I know the details
                > are in a certain order, but I don't know about it. The MangleAgents
                > keyword in webalizer.conf can be tuned to cut slack in the detail
                > level.
              • enventa2000
                ... animals ... shows) ... ] First of all, you have to make sure that you have at the start of your httpd.conf the definition of the different log format
                Message 7 of 7 , Apr 20, 2004
                  --- In webalizer@yahoogroups.com, "jlyandco" <jlyandco@y...> wrote:
                  > May I ask you a question regarding the "Virtual Host" containers?
                  >
                  > In my httpd.conf file, here is an example of what one of these
                  animals
                  > looks like:
                  >
                  > <VIRTUALHOST 216.117.171.112:80>
                  > ServerName wheelchairhouseco.com
                  > ServerAlias wheelchairhouseco.com www.wheelchairhouseco.com
                  > ServerAdmin junglrot@w...
                  > DocumentRoot /usr/local/etc/httpd/htdocs/wheelchairhouseco
                  > ErrorLog logs/wheelchairhouseco.com-error_log
                  > TransferLog logs/wheelchairhouseco.com-access_log
                  > </VIRTUALHOST>
                  >
                  > There is NO reference in this "container" to either "common" or
                  > "combined" log -- as you show in your attached (and wonderfully
                  > insightful) explanation.
                  >
                  > Is this something that I can modify (to reflect as your example
                  shows)
                  > and it will begin capturing all of the access, error, agent and
                  > referral information? Is there something else that I need to do?[...
                  ]


                  First of all, you have to make sure that you have at the start of your
                  httpd.conf the definition of the different log format types. It should
                  be like this:

                  LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}
                  i\"" combined
                  LogFormat "%h %l %u %t \"%r\" %>s %b" common
                  LogFormat "%{Referer}i -> %U" referer
                  LogFormat "%{User-agent}i" agent

                  Notice the order in which the formats are defined. This is important.
                  The last format is the "agent" format.


                  In your file your using the TransferLog directive. It takes either the
                  last format defined using LogFormat (in this case, the "agent" format)
                  or, if no LogFormat directive is used, the common format, which is
                  hard-coded somewhere inside the apache code.

                  In this example TransferLog will create logs in "agent" format.

                  You have two solutions:




                  1: Move the LogFormat for "combined" so it is the last one being
                  defined (you won't have to alter anything else in the file). Check
                  these are the only LogFormat lines in the file.

                  LogFormat "%h %l %u %t \"%r\" %>s %b" common
                  LogFormat "%{Referer}i -> %U" referer
                  LogFormat "%{User-agent}i" agent
                  LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}
                  i\"" combined




                  2: Use the CustomLog directive (instead of TransferLog):

                  CustomLog logs/wheelchairhouseco.com-access_log combined

                  You'll need to substitute every TransferLog line per a CustomLog line.
                  If you add the CustomLog and leave TransferLog in place, with the same
                  file for both, then everything would be logged twice to the same file.

                  (Remember that to use CustomLog you need to define the "combined"
                  format before the CustomLog line. If you didn't alter the default
                  configuration, it will be already defined.)
                Your message has been successfully submitted and would be delivered to recipients shortly.