Re: Data Mining From Server Logs
- As soon as you get them into a dB you can sit dozens of visual tools on top. I like QLIK desktop -- it is free and amazing and you can create all sort of visual representations of the data on the fly. I learned the tool in about 1 day or less thru videos they have on their site. www.qlik.com another one I have used which is fantastic is tableau.com but has cost associated.
The dB is really only hang up and getting all the data imported. I would imagine it is only 1 table mapped to the fields of the log files. Most of the major W/A vendors are arcitected in this way initially then they abstratect this data into Cubes fo the variety of reports users need for segmentation ect...
It really isn't difficult.
--- In email@example.com, Ravi Pathak <ravipathak1@...> wrote:
> > I would import them in MySQL database and use R (statistical software) may
> > be with Hadoop to analyze the data.
> > Off Late there has been considerable improvements in R from standpoint of
> > visualization and its ability to perform data analysis. I strongly agree
> > with the comments provided by Patrick, that with excel or even may be a log
> > analyzer , you might be limiting to what you can see/visualize/analyze &
> > find insights !!
> > Best,
> > Ravi
> > On Tue, Aug 31, 2010 at 12:12 AM, mspsysage <leticia.colon01@...>wrote:
> >> Hi
> >> I need to data mine through hundred of server log files - all containing
> >> WebTrends visitor/cookie information. I have already decompressed the files
> >> and and I opened a few of the files through Excel. I wonder if there is a
> >> program that can help streamline the analysis of these server logs.
> >> Thanks!
> >> L
> [Non-text portions of this message have been removed]
- Another point to make would be that mining raw server logs is going to be a painful starting place.
If this is a one-off data mining project for some specific question then it is probably affordable to shoulder the pain once. But if this is suppose dto be an ongoing project for many questions that are yet to be determined, then the raw logs (even with the cookies) will require a ton of extra steps to get to business level details:
* Which campaigns / business initiatives referred each visit
* What calls to action where viewed vs. completed
* Extracting search keywords from the many different search engines' referring URLs
* Excluding traffic from robots/spiders and employees
* Mapping IP addresses to geo locations
* Translating dynamic parameter codes into friendly names
* Profiling behavior across multiple rows of the log files, e.g. "visitor stayed at least 3 minutes on the site" or "viewed more than X pages" or "visited more than 5 times this week"
There are additional details that won't be found in server log files at all:
* User login names
* Shopping behavior
* Form abandonement
* Flash and RIA behavior
All these things can be taken care of by a web analytics solution. So, a data warehouse underlying the web analytics solution should offer you a much more compelling start place for your mining project vs. raw server logs. Ask your vendor.
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of mspsysage
Sent: Monday, August 30, 2010 11:42 AM
Subject: [webanalytics] Data Mining From Server Logs
I need to data mine through hundred of server log files - all containing WebTrends visitor/cookie information. I have already decompressed the files and and I opened a few of the files through Excel. I wonder if there is a program that can help streamline the analysis of these server logs.
[Non-text portions of this message have been removed]