3615RE: [webalizer] A question
- Apr 19, 2007
Just additional info for you… Running stats like that could take days or weeks to generate. A few have tried and on my log files (400-600 Mb per day), well, I quit processing them manually after 18 hours because I want to do something else at that point.
You can filter out some stats like googlebot by using the hidden tags in the config file. (I’m not sure if this completely removes all stats from googlebot or not though, but it would remove it from the agents section.)
#2 is the same as the pervious paragraph.
#3 & #4 this could be a hog of a report to generate if you get tons of users
#5 you can filter by extensions to
Btw… Visits are like ‘Sessions’ – a visit is a user hit that hit your pages 1 or more times within X minutes (30 default I think) – if the user comes back 2 hours later, then that is a second visit.
Like someone else has suggested before I finished this, you should probably do you own analysis on top of what webalyzer gives you… I’d recommend shoving logs into SQL and then running queries on them there and display as html.
I wonder if the developers are listening. I have some ideas for improvement I would dearly love to see implemented. I hope nobody minds me expressing those ideas here.
I would love to be able to do the following:
1) Filter out bots from the stats. A regular expression match on for example "*googlebot*" would do the trick.
2) Filter out specific domains and/or IP addresses (in particular, I want to filter out myself, as I'm responsible for about 90% of the traffic. I suspect this is not unusual during development and/or right around launch time - or at least until real traffic builds. I don't need to know where I've been, I already know by virtue of having been there). This would also help filter out certain bots, and/or certain users, and/or useless information from aggregated users - like AOL for instance if they're all using the same IP.
3) See reports that show me which IP addresses are hitting/visiting/entering/exiting which pages - currently you get a summary, but it doesn't tell me that IP 18.104.22.168 entered on page foo.html and exited on page bar.html, and also clicked through to foobar.html and fubar.html. You're summarizing URL info. as in URL x.html was visted 1000 times (without corresponding IP info.). I want to see a summary of IP info. as in IP 22.214.171.124 visited URL x.html 4 times, y.html 6 times and z.html 9 times. Don't limit me to the top 10 or top 50 or top 100, unless I ask to limit it. Show me all of them if I want to see all of them, or just the bottom N IPs or top N IPs, etc. Give me some flexibility in what I choose to see.
4) In addition to item 3 above, I would like to see the exact time & date they visited those pages.
5) I'd like to be able to filter out specific files from being reported. For example, I know I have images on specific pages and that I use stylesheets, I don't need to know that image.jpg and/or style.css was hit when they visited index.html and/or page.html. The fact that they visited index.html and/or page.html is sufficient for me to know that the images and stylesheets on those pages were hit, and providing that superfluous information doesn't add any value. In fact, it substantially decreases the value because I can't get the information I want, it's overwhelmed by this other pointless info. which is packed into the "top 10".
It tells me that foo.html was hit 500 times which is nice to know, but it doesn't tell me that IP 126.96.36.199 hit (visited?) foo.html file 100 times, and that IP 188.8.131.52 hit foo.html file 400 times. That however, is the information I really want to know. That shows me only 2 IPs are responsible for all of my 500 hits/visits and the 50 other sites listed were apparently hitting other files (which would also show up in the stats). Now I can decide if those 500 hits have any true value to me and/or what that value is, based on the IP reported.
All of this info. is in the raw log files. It simply needs to be organized differently than is currently reported by webalizer.
Please note, as a simple user subject to my webhosts restrictions, I have no control over the compile time characteristics of the program, nor even startup characteristics. These features would have to be accessible from the web page that webalizer prints when it sums up the stats (or a separate runtime configuration page if necessary). I should be able to "lock in" my choices so I don't have to specify them each and every time I run webalizer.
From: firstname.lastname@example.org [mailto:email@example.com]On Behalf Of Southerland, Adam
Sent: Thursday, April 19, 2007 11:28 AM
Subject: RE: [webalizer] A question
Have you looked at these documents yet?
Stats Explained in general: http://www.webalizer.org/simpleton.html
This has the stats in depth: http://www.mrunix.net/webalizer/webalizer_help.html (Including what the words like Visits and Hits represent)
i cannot find so far any understandable information on what the stats mean. I'm told that 'visits' are important - but can someone tell me please - what are visits and what are hits. How can I tell what is going on. Is there any concise information available?
Telephone 07785 941781
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today.
- << Previous post in topic Next post in topic >>