3618RE: [webalizer] A question
- Apr 19, 2007
I know there are some other Web Analysis programs out there that do what you want. I forget if they are free or not. For my purposes, Webalyzer is just what I need/want. You on the other hand may need something else and as long as you can get your log files off the server (most web hosts I’ve seen allow this.) then you can try the other applications… (having the logs also mean that you can use webalyzer locally to produce the same stats and then tweak the config)
Web Druid (An orange color scheme web site) which looks like it is based on webalyzer shows the flow of the users. There may be others out there that do what you want without programming involved. (I’ve looked at several, but it was about a year ago and I don’t remember names. A web search should give nice results.)
I understand your concerns and they do differ from mine due to the site size. If I need other stats, I go an calculate them by hand (using tools)
Just additional info for you… Running stats like that could take days or weeks to generate. A few have tried and on my log files (400-600 Mb per day), well, I quit processing them manually after 18 hours because I want to do something else at that point.
I could understand on a file that size. However you could further break it down by hour and just process part of the file. It's been my experience that most programs attempt to suck the entire file into memory, and therefore spends all the CPU's time paging & swapping instead of actually analyzing. If webalizer (or whatever program) instead broke the file up into manageable chunks, chances are it would go substantially faster.
However, I'm not really thinking about your scenario because 1) my log files are very small because it's a brand new website. Newer sites are more concerned with these things than older sites with an established presence. When I reach log files of that size I too will probably find this extra information less useful. 2) You don't HAVE to process them in this manner, it's an option, not a requirement. 3) Faster hardware is always coming out. What used to take months years ago, now takes days. Software should never be handicapped due to hardware limitations, other than to write them more efficiently.
You can filter out some stats like googlebot by using the hidden tags in the config file. (I’m not sure if this completely removes all stats from googlebot or not though, but it would remove it from the agents section.)
Keypoint: To the best of my knowledge at this time, I do not have access to the config file. I'm just a plain old user on a webhost that controls most everything I do. Just about all tasks require that I click on an icon. No shell access. Sucks to be me. Further, unless it does in fact remove it from all stats, it doesn't help me.
Btw… Visits are like ‘Sessions’ – a visit is a user hit that hit your pages 1 or more times within X minutes (30 default I think) – if the user comes back 2 hours later, then that is a second visit.
Useful to know thank you, but not as useful as what I outlined earlier - in my opinion.
Like someone else has suggested before I finished this, you should probably do you own analysis on top of what webalyzer gives you… I’d recommend shoving logs into SQL and then running queries on them there and display as html.
I guess. I find that I usually have to "roll my own" for virtually everything and I'm getting too old to keep on slicing and dicing. It would just be nice if for once somebody's program had features I found truly useful. The data is there. The people/person who wrote the software did a really nice job of presenting some of it. They clearly know how to write nice software. Can't there be some thought & consideration given to presenting the data in a different way that might be more meaningful to some people?
I wonder if the developers are listening. I have some ideas for improvement I would dearly love to see implemented. I hope nobody minds me expressing those ideas here.
I would love to be able to do the following:
1) Filter out bots from the stats. A regular expression match on for example "*googlebot*" would do the trick.
2) Filter out specific domains and/or IP addresses (in particular, I want to filter out myself, as I'm responsible for about 90% of the traffic. I suspect this is not unusual during development and/or right around launch time - or at least until real traffic builds. I don't need to know where I've been, I already know by virtue of having been there). This would also help filter out certain bots, and/or certain users, and/or useless information from aggregated users - like AOL for instance if they're all using the same IP.
3) See reports that show me which IP addresses are hitting/visiting/entering/exiting which pages - currently you get a summary, but it doesn't tell me that IP 22.214.171.124 entered on page foo.html and exited on page bar.html, and also clicked through to foobar.html and fubar.html. You're summarizing URL info. as in URL x.html was visted 1000 times (without corresponding IP info.). I want to see a summary of IP info. as in IP 126.96.36.199 visited URL x.html 4 times, y.html 6 times and z.html 9 times. Don't limit me to the top 10 or top 50 or top 100, unless I ask to limit it. Show me all of them if I want to see all of them, or just the bottom N IPs or top N IPs, etc. Give me some flexibility in what I choose to see.
4) In addition to item 3 above, I would like to see the exact time & date they visited those pages.
5) I'd like to be able to filter out specific files from being reported. For example, I know I have images on specific pages and that I use stylesheets, I don't need to know that image.jpg and/or style.css was hit when they visited index.html and/or page.html. The fact that they visited index.html and/or page.html is sufficient for me to know that the images and stylesheets on those pages were hit, and providing that superfluous information doesn't add any value. In fact, it substantially decreases the value because I can't get the information I want, it's overwhelmed by this other pointless info. which is packed into the "top 10".
It tells me that foo.html was hit 500 times which is nice to know, but it doesn't tell me that IP 188.8.131.52 hit (visited?) foo.html file 100 times, and that IP 184.108.40.206 hit foo.html file 400 times. That however, is the information I really want to know. That shows me only 2 IPs are responsible for all of my 500 hits/visits and the 50 other sites listed were apparently hitting other files (which would also show up in the stats). Now I can decide if those 500 hits have any true value to me and/or what that value is, based on the IP reported.
All of this info. is in the raw log files. It simply needs to be organized differently than is currently reported by webalizer.
Please note, as a simple user subject to my webhosts restrictions, I have no control over the compile time characteristics of the program, nor even startup characteristics. These features would have to be accessible from the web page that webalizer prints when it sums up the stats (or a separate runtime configuration page if necessary). I should be able to "lock in" my choices so I don't have to specify them each and every time I run webalizer.
From: email@example.com [mailto:firstname.lastname@example.org]On Behalf Of Southerland, Adam
Sent: Thursday, April 19, 2007 11:28 AM
Subject: RE: [webalizer] A question
Have you looked at these documents yet?
Stats Explained in general: http://www.webalizer.org/simpleton.html
This has the stats in depth: http://www.mrunix.net/webalizer/webalizer_help.html (Including what the words like Visits and Hits represent)
i cannot find so far any understandable information on what the stats mean. I'm told that 'visits' are important - but can someone tell me please - what are visits and what are hits. How can I tell what is going on. Is there any concise information available?
Telephone 07785 941781
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today.
- << Previous post in topic Next post in topic >>