Loading ...
Sorry, an error occurred while loading the content.

processing a large archive of logs

Expand Messages
  • Mark Steudel
    I have a large archive of of logs that I would like to process. Can I process the logs out of order? I m just going to write a little script to list the
    Message 1 of 6 , Feb 9, 2005
    View Source
    • 0 Attachment
      I have a large archive of of logs that I would like to process. Can I
      process the logs out of order? I'm just going to write a little script to
      list the content of a directory then loop through each file and process it.
      What are the best options for this? I'm a little unclear as to what the
      ramifications of ignoring history and using the preseve incremental or not
      using it. I've moved the files around so sometimes the timestamps on the
      files themselves are all the same, does that matter or does webalizer just
      look inside the files. Anyway any advice would be appreciated.

      Thanks, Mark
    • Gerald de la Pascua
      Mark, I m a new user, so don t take what I say with too much value, but what I am doing is saving the logs with names that list in date order, then I can just
      Message 2 of 6 , Feb 9, 2005
      View Source
      • 0 Attachment
        Mark,

        I'm a new user, so don't take what I say with too much value,
        but what I am doing is saving the logs with names that list in date order,
        then I can just ls | while read nextfile
        and always process the logs in the correct order,

        I use

        yyyy-mm-dd-http.gz

        I have had problems where half a month is lost if I incrementally
        process from the start of the month, I don't know if this was a
        quirk or I did something wrong, I am sure it's possible to process
        logs daily.

        I am using freebsd,

        kind regards,

        Gerald



        On Wed, 9 Feb 2005 10:30:06 -0800, Mark Steudel <msteudel@...> wrote:
        >
        > I have a large archive of of logs that I would like to process. Can I
        > process the logs out of order? I'm just going to write a little script to
        > list the content of a directory then loop through each file and process it.
        > What are the best options for this? I'm a little unclear as to what the
        > ramifications of ignoring history and using the preseve incremental or not
        > using it. I've moved the files around so sometimes the timestamps on the
        > files themselves are all the same, does that matter or does webalizer just
        > look inside the files. Anyway any advice would be appreciated.
        >
        > Thanks, Mark
        >
        > Webalizer homepage: http://www.webalizer.org
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
      • Enric Naval
        Webalizer just looks inside the files. It will ignore the file s timestamp. Ignoring history and setting incremental to off are the same thing. Ignoring
        Message 3 of 6 , Feb 9, 2005
        View Source
        • 0 Attachment
          Webalizer just looks inside the files. It will ignore
          the file's timestamp.

          "Ignoring history" and "setting incremental to off"
          are the same thing.

          Ignoring history means that if you process twice the
          same logfile, you will probably destroy your stats or
          totally screw them beyond recognition. Even one single
          line processed twice may cause strange effects in your
          stats.

          Ignoring history always processes every line in the
          logfiles, which makes it slower.

          On the other hand, keeping history means that you can
          happily process as many times as you want the same
          logfile, and it will only take into account the new
          lines, ignoring the old ones.

          Keeping history is also faster, because it is not
          processing the old lines. The problem appears when the
          history file breaks. You should always keep backups of
          your old logfiles, in CDs or other support, so you can
          reprocess your logfiles if you can't recover your
          history file.

          If you keep history, then changes in your
          webalizer.conf file will only apply to the new lines
          processed, not the old ones already counted as
          processed in the history file. To get the changes
          applied to all your stats, you have to ignore history
          and reprocess all logfiles from the start, so the
          stats get re-done from scratch, then keep history
          again.



          To loop the logs, if they are named so that they
          appear sorted when doing "ls", you can use:

          #!/bin/bash
          #
          # process_log.sh
          # process a logfile directory for webalizer
          #
          for i in $( ls /path/to/logfiles/* ) do;
          echo "Processing: "$i;
          webalizer "$i" -more-options ;
          done;


          Hum, I wrote a veeeery long email again :)

          --- Mark Steudel <msteudel@...> wrote:

          > I have a large archive of of logs that I would like
          > to process. Can I
          > process the logs out of order? I'm just going to
          > write a little script to
          > list the content of a directory then loop through
          > each file and process it.
          > What are the best options for this? I'm a little
          > unclear as to what the
          > ramifications of ignoring history and using the
          > preseve incremental or not
          > using it. I've moved the files around so sometimes
          > the timestamps on the
          > files themselves are all the same, does that matter
          > or does webalizer just
          > look inside the files. Anyway any advice would be
          > appreciated.
          >
          > Thanks, Mark
          >
          >


          =====
          Enric Naval
          Estudiante de Inform�tica de Gesti�n en la Udl (Lleida)
          GRIHO webalizer.conf
          http://griho.udl.es/webalizer/webalizer.conf.txt



          __________________________________
          Do you Yahoo!?
          Yahoo! Mail - Easier than ever with enhanced search. Learn more.
          http://info.mail.yahoo.com/mail_250
        • Bradford L. Barrett
          ... This is incorrect. Ignoring history will lose all previous months saved data, so you wind up with a main index showing only the month you just processed
          Message 4 of 6 , Feb 9, 2005
          View Source
          • 0 Attachment
            > "Ignoring history" and "setting incremental to off"
            > are the same thing.

            This is incorrect. Ignoring history will lose all previous months
            saved data, so you wind up with a main index showing only the month
            you just processed and none prior. It causes the program to ignore
            any existing 'webalizer.hist' file.

            Incremental mode allows you to process a month using multiple,
            partial log files. Setting it to off forces you to process
            whole months in a single log, and causes the program to ignore
            any existing 'webalizer.current' file.

            For the original question, as long as your log fies are named
            correctly (ie: they will list correctly in chronological order,
            typically named like YYYYMMDD-something), then you can use
            incremantal mode and process all the logs like:

            for i in /path/to/logs; do webalizer $i; done

            Make sure your config file is set correctly for the output
            directory, hostname, etc... or specify them on the command
            line above.

            [...]

            Specifics:

            > > I have a large archive of of logs that I would like
            > > to process. Can I
            > > process the logs out of order?

            No, you cannot process out of order as long as you use
            incremental mode. They must be in chronological order.

            > > I'm just going to
            > > write a little script to
            > > list the content of a directory then loop through
            > > each file and process it.
            > > What are the best options for this?

            See above..

            > > I'm a little
            > > unclear as to what the
            > > ramifications of ignoring history and using the
            > > preseve incremental or not
            > > using it.

            You should NEVER ignore history, and as long as you have partial
            logs (not full months), then you must use incremental mode.


            > > I've moved the files around so sometimes
            > > the timestamps on the
            > > files themselves are all the same, does that matter
            > > or does webalizer just
            > > look inside the files. Anyway any advice would be
            > > appreciated.

            The timestamps on the files themselves doesn't matter, unless
            you were going to rely on them to tell you what order to feed
            the files. If you named them correctly, then you already know,
            based on filename, which order the files must be processed.

            --
            Bradford L. Barrett brad@...
            A free electron in a sea of neutrons DoD#1750 KD4NAW

            The only thing Micro$oft has done for society, is make people
            believe that computers are inherently unreliable.
          • Mark Steudel
            Thank s a bunch! One last question, I have a few logs that when they got downloaded to my local server the log files were empty, will that cause a problem?
            Message 5 of 6 , Feb 11, 2005
            View Source
            • 0 Attachment
              Thank's a bunch! One last question, I have a few logs that when they got
              downloaded to my local server the log files were empty, will that cause a
              problem?
              ________________________________________
              From: Bradford L. Barrett [mailto:brad@...]
              Sent: Wednesday, February 09, 2005 2:58 PM
              To: Enric Naval
              Cc: webalizer@yahoogroups.com
              Subject: Re: [webalizer] processing a large archive of logs


              > "Ignoring history" and "setting incremental to off"
              > are the same thing.

              This is incorrect.  Ignoring history will lose all previous months
              saved data, so you wind up with a main index showing only the month
              you just processed and none prior.  It causes the progls /rairam to ignore
              any existing 'webalizer.hist' file.

              Incremental mode allows you to process a month using multiple,
              partial log files.  Setting it to off forces you to process
              whole months in a single log, and causes the program to ignore
              any existing 'webalizer.current' file.

              For the original question, as long as your log fies are named
              correctly (ie: they will list correctly in chronological order,
              typically named like YYYYMMDD-something), then you can use
              incremantal mode and process all the logs like:

              for i in /path/to/logs; do webalizer $i; done

              Make sure your config file is set correctly for the output
              directory, hostname, etc... or specify them on the command
              line above.

              [...]

              Specifics:

              > > I have a large archive of of logs that I would like
              > > to process. Can I
              > > process the logs out of order?

              No, you cannot process out of order as long as you use
              incremental mode.  They must be in chronological order.

              > > I'm just going to
              > > write a little script to
              > > list the content of a directory then loop through
              > > each file and process it.
              > > What are the best options for this?

              See above..

              > > I'm a little
              > > unclear as to what the
              > > ramifications of ignoring history and using the
              > > preseve incremental or not
              > > using it.

              You should NEVER ignore history, and as long as you have partial
              logs (not full months), then you must use incremental mode.


              > > I've moved the files around so sometimes
              > > the timestamps on the
              > > files themselves are all the same, does that matter
              > > or does webalizer just
              > > look inside the files. Anyway any advice would be
              > > appreciated.

              The timestamps on the files themselves doesn't matter, unless
              you were going to rely on them to tell you what order to feed
              the files.  If you named them correctly, then you already know,
              based on filename, which order the files must be processed.

              --
              Bradford L. Barrett                      brad@...
              A free electron in a sea of neutrons     DoD#1750 KD4NAW

              The only thing Micro$oft has done for society, is make people
              believe that computers are inherently unreliable.


              Webalizer homepage: http://www.webalizer.org




              Yahoo! Groups Sponsor
              ADVERTISEMENT




              ________________________________________
              Yahoo! Groups Links
              • To visit your group on the web, go to:
              http://groups.yahoo.com/group/webalizer/
               
              • To unsubscribe from this group, send an email to:
              webalizer-unsubscribe@yahoogroups.com
               
              • Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
            • Bradford L. Barrett
              ... No.. it will just tell you that no records were processed for that log, and no activity will be reported for that time period. -- Bradford L. Barrett
              Message 6 of 6 , Feb 11, 2005
              View Source
              • 0 Attachment
                On Fri, 11 Feb 2005, Mark Steudel wrote:
                >
                > Thank's a bunch! One last question, I have a few logs that when they got
                > downloaded to my local server the log files were empty, will that cause a
                > problem?

                No.. it will just tell you that no records were processed for that log,
                and no activity will be reported for that time period.

                --
                Bradford L. Barrett brad@...
                A free electron in a sea of neutrons DoD#1750 KD4NAW

                How do you give Microsoft the benefit of the doubt when you
                know that if you were to throw it in a room with truth, you'd
                risk a matter/anti-matter explosion? -- Nicholas Petreley IDG
              Your message has been successfully submitted and would be delivered to recipients shortly.