Loading ...
Sorry, an error occurred while loading the content.

Re: Browsers, Search Engines and Phrases -- where?

Expand Messages
  • enventa2000
    Ahem, I made a small mistake, I said that there is filler in the log, but it is instead the remote logname and the remote user. We don t use that on our
    Message 1 of 7 , Apr 10 7:38 AM
    • 0 Attachment
      Ahem, I made a small mistake, I said that there is "filler" in the
      log, but it is instead the remote logname and the remote user. We
      don't use that on our server, so every entrie in our logs has "- -"
      instead of user names because there are no users.

      Here there is complete information about how to configure the
      LogFormat directive and all the "%" thingies.

      http://httpd.apache.org/docs-2.0/mod/mod_log_config.html



      So, this line would be:

      squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.gif
      HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
      (compatible; MSIE 6.0; Windows NT 5.0)"


      Visiter IP: squid1.gvea.com
      Remote logname (identd): -
      Remote User (auth): -
      Date: [01/Aug/2003:17:57:18 -0400]
      Request type: GET
      URL requested: /icons/blank.gif
      HTTP version: HTTP/1.1
      Response code: 200
      Response bytes: 148
      Referrer: http://www.the-print-broker.com/
      Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows
      NT 5.0)

      I'm sure that I am missing more things, but you may desire to check
      the list in the link for better info and compare to your LogFormat
      lines. You can make more customisation to the logs than you need to.
    • jlyandco
      May I ask you a question regarding the Virtual Host containers? In my httpd.conf file, here is an example of what one of these animals looks like:
      Message 2 of 7 , Apr 19 1:20 AM
      • 0 Attachment
        May I ask you a question regarding the "Virtual Host" containers?

        In my httpd.conf file, here is an example of what one of these animals
        looks like:

        <VIRTUALHOST 216.117.171.112:80>
        ServerName wheelchairhouseco.com
        ServerAlias wheelchairhouseco.com www.wheelchairhouseco.com
        ServerAdmin junglrot@...
        DocumentRoot /usr/local/etc/httpd/htdocs/wheelchairhouseco
        ErrorLog logs/wheelchairhouseco.com-error_log
        TransferLog logs/wheelchairhouseco.com-access_log
        </VIRTUALHOST>

        There is NO reference in this "container" to either "common" or
        "combined" log -- as you show in your attached (and wonderfully
        insightful) explanation.

        Is this something that I can modify (to reflect as your example shows)
        and it will begin capturing all of the access, error, agent and
        referral information? Is there something else that I need to do?

        (This is still pretty new to me, but I want to know what I'm doing --
        because I want to do this right!)

        Thanks for you help! Let me know if I can provide you with anything else!

        Regards,
        Jeff
        =============================================






        --- In webalizer@yahoogroups.com, "enventa2000" <enventa2000@y...> wrote:
        > This a LONG explanation, but worth reading it, I promise. You'll gain
        > insight, and you may later impress other people :)
        >
        >
        > Ok, here you are just defining the information appearing in each log
        > format. This is right:
        >
        > For the "combined" format:
        >
        > > > > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
        > > > > \"%{User-Agent}i\"" combined
        >
        > For the "common" format:
        >
        > > > > LogFormat "%h %l %u %t \"%r\" %>s %b" common
        >
        > And for the "referer" and "agent ones"
        >
        > > > > LogFormat "%{Referer}i -> %U" referer
        > > > > LogFormat "%{User-agent}i" agent
        >
        > Here, you see? You have commented the lines defining your referer and
        > agent logs, that's why they stopped being logged. You commented the
        > line that instructs Apache to actually produce them. Just delete the
        > "#" at the start of the line, and reload or restart the apache server:
        >
        > > > > #CustomLog /usr/local/etc/httpd/logs/referer_log referer
        > > > > #CustomLog /usr/local/etc/httpd/logs/agent_log agent
        >
        >
        > Here we have the main log file, defined as a "combined" log. This is
        > right.
        >
        > > > > CustomLog /usr/local/etc/httpd/logs/access_log combined
        >
        >
        > Now, the main problem. Go to your http.conf file (make sure it is
        > actually the httpd.conf used by apache, and not some copy in other
        > directory). Search for "VirtualHost". You should find a sample
        > VirtalHost container (if you didn't delete it, of course):
        >
        > #<VirtualHost *>
        > # ServerAdmin webmaster@d...
        > # DocumentRoot /www/docs/dummy-host.example.com
        > # ServerName dummy-host.example.com
        > # ErrorLog logs/dummy-host.example.com-error_log
        > # CustomLog logs/dummy-host.example.com-access_log common
        > #</VirtualHost>
        >
        > You see the "*"? This means that this applies to every virtual host in
        > the server. Swap it for, say "printer-broker", and the info there will
        > only apply to the "printer-broker" subdomain.
        >
        > YOU NEED TO CHANGE THE CUSTOMLOG DIRECTIVE. If you just uncommented
        > this, then you have just told apache to log every subdomain in a file
        > called dummy-host.example.com-access_log, WHICH IS IN COMMON FORMAT,
        > THAT'S IT, NO REFERRER AND NO AGENT.
        >
        > By our description, I guess you have made a different VirtualHost
        > container for each subdomain, so you need to go over ALL of them one
        > by one, changing in each one the format from "common" to "combined".
        >
        > After doing this, save the file and reload or restart apache so it
        > starts logging in the new format. Notice that this will cause the logs
        > to have some entries in common format or some entries in combined
        > format, causing potencial source of problems, like webalizer uncapable
        > to guess if the log is in commom format or in combined format, perhaps
        > causing it to believe it is common format and dropping altogether the
        > referrer and agent info.
        >
        > To prevent this, you could stop the server, rotate the logs and then
        > restart the server. Should take less than one minute! Now some files
        > will be in common format and others will be in combined format, but
        > this a different problem.
        >
        > Please post here if you could solve it, and what happened later. You
        > could also post one of your VirtualHost containers before you change
        > it, so to see if that was the problem. You could change the names on
        > it to avoid privacy problems.
        >
        >
        >
        >
        > LONG INFO ABOUT THE "LOOKS LIKE GREEK TO ME" LOG FILES :)
        >
        >
        > [...]
        > > These are from "timberlinechurch.org-access_log" (one of the many
        > > "virtual domains" I meant by "virtualhostdomainname-access_log"):
        >
        > > 12-252-40-5.client.attbi.com - - [24/Apr/2003:03:32:00 -0400] "GET
        > > /html/outreach/over.html HTTP/1.1" 200 9725
        > [...]
        >
        >
        > This is "common" format log, it lacks the referrer and the agent at
        > the end of each line. Look below.
        >
        >
        >
        >
        > > The log reports from my "primary"? log (access_log) does appear
        > > different, and that moreinfo is collected there:
        >
        > > squid1.gvea.com - - [01/Aug/2003:17:57:18 -0400] "GET /icons/blank.
        > gif
        > > HTTP/1.0" 200 148 "http://www.the-print-broker.com/" "Mozilla/4.0
        > > (compatible; MSIE 6.0; Windows NT 5.0)"
        > [...]
        >
        >
        > Visiter IP: squid1.gvea.com
        > Filler: - -
        > Date: [01/Aug/2003:17:57:18 -0400]
        > Request type: GET
        > URL requested: /icons/blank.gif
        > HTTP version: HTTP/1.1
        > Response code: 200
        > Response bytes: 148
        > Referrer: http://www.the-print-broker.com/
        > Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
        >
        >
        > This visitor was visiting your "the-printer-broker.com" website, and
        > its Internet Explorer browser (version 6.0) was retrieving the blank.
        > gif image, in order to render all images in the page. If you look at
        > the context, you'll see how from the same IP (or resolved address)
        > there is first a request for the page, and then later, successive,
        > very fast requests for all elements on that page.
        >
        > BOUNDARIES
        >
        > Notice how date is enclosed by [], and how GET, page request and HTTP
        > version are all enclosed together by "". This allows programs like
        > webalizer to identify the data. Referrer and agent are each one
        > enclosed by "" (they are enclosed mainly because they may have blank
        > spaces in the middle of it, which causes confusion to program parsers,
        > which tend to believe that blank spaces always separate fields).
        >
        > REQUEST TYPES:
        >
        > GET most common one. can send information inputted in forms. You may
        > see spam relay probes like "GET /cgi-bin/formmail.pl?
        > email=fake_name@m...&recipient=throw_away@h...&subject=www.
        > griho.udl.es/cgi-bin/formmail.pl HTTP/1.0"
        > The spammer will later check the throw_away address to see if the
        > probe went throught, and then relay the heck out of your server. See
        > how the subject itself has the probe address, so the spammer doesn't
        > even need to check the body of the message. Use:
        > "grep /var/log/access_log* -e formmail" to find them.
        >
        > POST used for sending information from forms. more secure than "GET"
        > to send info from forms
        >
        > CONNECT probably a attempt of proxyconnection. most surely a spammer
        > trying to relay spam, or a hacker looking for free proxies. I see a
        > lot: "CONNECT IP:PORT HTTP/1.0" Make sure the server is answering 505
        > method not allowed! If it answers some code saying "accepted" or
        > something similar, then your server maybe is being used to relay spam!
        > Try "grep /var/log/access_log* -e CONNECT" in the command line to find
        > this abuse.
        >
        > DELETE I don't know how this is supossed to work, but it sure looks
        > like a bad thing to allow to visitors
        >
        > OPTIONS website publisher programs (I don't know what this is for).
        >
        >
        >
        > RESPONSE CODES
        >
        > Most common Response codes:
        >
        > 200 "OK"
        >
        > 304 "Not changed" (the browser asks wheter it needs to refresh this
        > file in his cache, and the server answers "304, no it hasn't changed,
        > just use your cached version instead of pulling the file again")
        > (normally you see 304 0 because the server sent just a response code,
        > so it has sent zero bytes of data)
        >
        > 404 "Not Found"
        >
        > 405 "Method not allowed" (usually answers to spam abuse CONNECT and
        > also to PROPFIND from publisher programs like webDAV, also appears to
        > POST if you forbid the method by accident or on purpose)
        >
        > 500 "Server error" Your server is having problems.
        >
        >
        > These response codes appear when you move things around and tell the
        > server to redirect people to the new URL instead of sending 404.
        >
        > 301 "moved permanently"
        >
        > 302 "moved temporaly"
        >
        > 307 "temporary redirect"
        >
        >
        >
        > Here you have the response code list from the Apache 2 source code
        > (about line #431) and list of request types (about line #516). There
        > are lots of unused codes you don't need to worry about. It's just the
        > usual over-engineering in internet protocols. Throwing features in
        > "just in case".
        >
        > http://lxr.webperf.org/source.cgi/include/httpd.h
        >
        >
        >
        >
        >
        > > My "referrer.log" looks like it started to record, but then stopped:
        > > http://search.dogpile.com/texis/search?
        > method=&top=1&brand=dogpile&q=healing+arts+fort+collins&cat=web
        > > -> /
        > [...]
        >
        > Ok, the referrer format is:
        >
        > http://server/page/user/comes/from -> /page_requested_in_your_server
        >
        > When the visiter has typed the address directly, or has copy&pasted
        > it, or has clicked on his favorites, it is a direct request:
        >
        > - -> /page_requested_in_your_server
        >
        > This is a direct request to http://my_domain.com/ (perhaps someone
        > trying your company name in the address bar? perhaps someone clicking
        > in My Favorites?)
        >
        > - -> /
        >
        > This is a direct request to http://my_domain.com/articles/new.html
        >
        > - -> /articles/new.html
        >
        >
        >
        >
        >
        >
        > > My "agent_log" does the same thing:
        > > Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
        > [...]
        >
        > This one is trivial. "Program/version (details)". I know the details
        > are in a certain order, but I don't know about it. The MangleAgents
        > keyword in webalizer.conf can be tuned to cut slack in the detail
        > level.
      • enventa2000
        ... animals ... shows) ... ] First of all, you have to make sure that you have at the start of your httpd.conf the definition of the different log format
        Message 3 of 7 , Apr 20 2:28 PM
        • 0 Attachment
          --- In webalizer@yahoogroups.com, "jlyandco" <jlyandco@y...> wrote:
          > May I ask you a question regarding the "Virtual Host" containers?
          >
          > In my httpd.conf file, here is an example of what one of these
          animals
          > looks like:
          >
          > <VIRTUALHOST 216.117.171.112:80>
          > ServerName wheelchairhouseco.com
          > ServerAlias wheelchairhouseco.com www.wheelchairhouseco.com
          > ServerAdmin junglrot@w...
          > DocumentRoot /usr/local/etc/httpd/htdocs/wheelchairhouseco
          > ErrorLog logs/wheelchairhouseco.com-error_log
          > TransferLog logs/wheelchairhouseco.com-access_log
          > </VIRTUALHOST>
          >
          > There is NO reference in this "container" to either "common" or
          > "combined" log -- as you show in your attached (and wonderfully
          > insightful) explanation.
          >
          > Is this something that I can modify (to reflect as your example
          shows)
          > and it will begin capturing all of the access, error, agent and
          > referral information? Is there something else that I need to do?[...
          ]


          First of all, you have to make sure that you have at the start of your
          httpd.conf the definition of the different log format types. It should
          be like this:

          LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}
          i\"" combined
          LogFormat "%h %l %u %t \"%r\" %>s %b" common
          LogFormat "%{Referer}i -> %U" referer
          LogFormat "%{User-agent}i" agent

          Notice the order in which the formats are defined. This is important.
          The last format is the "agent" format.


          In your file your using the TransferLog directive. It takes either the
          last format defined using LogFormat (in this case, the "agent" format)
          or, if no LogFormat directive is used, the common format, which is
          hard-coded somewhere inside the apache code.

          In this example TransferLog will create logs in "agent" format.

          You have two solutions:




          1: Move the LogFormat for "combined" so it is the last one being
          defined (you won't have to alter anything else in the file). Check
          these are the only LogFormat lines in the file.

          LogFormat "%h %l %u %t \"%r\" %>s %b" common
          LogFormat "%{Referer}i -> %U" referer
          LogFormat "%{User-agent}i" agent
          LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}
          i\"" combined




          2: Use the CustomLog directive (instead of TransferLog):

          CustomLog logs/wheelchairhouseco.com-access_log combined

          You'll need to substitute every TransferLog line per a CustomLog line.
          If you add the CustomLog and leave TransferLog in place, with the same
          file for both, then everything would be logged twice to the same file.

          (Remember that to use CustomLog you need to define the "combined"
          format before the CustomLog line. If you didn't alter the default
          configuration, it will be already defined.)
        Your message has been successfully submitted and would be delivered to recipients shortly.