Loading ...
Sorry, an error occurred while loading the content.

split up very large files

Expand Messages
  • Lay, Coy
    Somebody help me get started on this script. I get very large log files and I clean them up to import into a spreadsheet. My clean up script now takes as
    Message 1 of 5 , Dec 9, 2002
    • 0 Attachment
      Somebody help me get started on this script. I get very large log files and
      I clean them up to import into a spreadsheet. My clean up script now takes
      as long as several hours so I want to split the big file into smaller files
      before cleaning them up. That way I can run the cleanup only on the parts of
      the file I'm interested in. I want to write the first 10,000 lines to a
      disk file, write the next 10,000 to a different file, and so forth. These
      files are 20 to 50mb so I'd like to avoid as much selecting and deleting as
      I can. If I have to select the text first, I want to select all the lines
      with JAN- into one file and FEB- into another, etc. Is there a function I
      can use to just write to a disk file based on line numbers or some other
      easily testable parameter?

      Thanks in advance.

      -- Coy
    • hugo_paulissen <h.paulissen@facburfdcw.u
      ... files and ... now takes Coy, A similar question has been asked before. Read the post of Dick Gascoigne in the archive... (message 8255). Maybe you have to
      Message 2 of 5 , Dec 10, 2002
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "Lay, Coy" <coy_lay@n...> wrote:
        > Somebody help me get started on this script. I get very large log
        files and
        > I clean them up to import into a spreadsheet. My clean up script
        now takes

        Coy,

        A similar question has been asked before. Read the post of Dick
        Gascoigne in the archive... (message 8255). Maybe you have to find
        related messages as well. He is still active on the list, so maybe
        you can ask him for the complete clip and use that as a start.

        http://groups.yahoo.com/group/ntb-clips/message/8255

        BTW, I think it must be possible to make the clip (much) faster.

        Hugo
      • thefrank <tf@thefrank.com>
        Hi Coy, Since these are logfiles, I am thinking you may have server access. If this is the case it may be much simpler to split the files directly on your
        Message 3 of 5 , Dec 10, 2002
        • 0 Attachment
          Hi Coy,

          Since these are logfiles, I am thinking you may have server access.
          If this is the case it may be much simpler to split the files
          directly on your server, then download the split files into Note Tab
          to deal with. I have split huge logfiles and monster data files into
          manageable chunks in less than a second.

          The functions are OS related. The following is for a BSD platform:

          -----

          split [options] [infile] [outfile]

          For example, suppose bigfile contains 4000 lines.

          $ split bigfile smf
          will create four files: smfaa, smfab, smfac, smfad.
          Split options are shown below.

          Option Description
          -l n Specify the number of lines in each output file. For example, "-
          l 80" splits infile into files with 80 lines each. The default is
          1000. Note that the last file may have less than n lines. The -b and -
          l options should not be used together.
          Note: On some older system this option is specified as -n. For
          example, "split -100 myfile" splits myfile into files with 100 lines
          each.

          -b n[k | m] Specify the size of output files. For example, "-b 1024"
          splits infile into files of size 1024 bytes. Append k to specify the
          size in kilobytes or m to specify the size in megabytes. For example,
          "-b 1m" splits infile into 1 megabytes output files. The -b and -l
          options should not be used together.
          -a n Use n characters in the output filename suffix. For example, "-
          a 3" would append aaa, aab, ... to output filenames. The default is
          2. Note: This option is not available on all Unix systems.

          FreeBSD and OpenBSD provide an additional and extremely useful
          option, -p, that allows you to split a file at every line that
          matches a string or regular expression. See the special note for
          FreeBSD and OpenBSD users for more information.

          -----

          This should work for your requirement:

          split -l 10000 biglogfilename logchunk

          After running the command on a 100k lines logfile you should find an
          additional 10 files named

          logchunkaa
          logchunkab
          logchunkac
          logchunkad
          logchunkae
          logchunkaf
          logchunkag
          logchunkah
          logchunkai
          logchunkaj

          For other server platforms consult their helpfiles. They should have
          something similar.

          Regards,

          tf
          http://thefrank.com
        • Manuel123
          Here is a xml idea. How can I extract to new document the name of people who is list notetab from this txt file. It s possible that clip question me which
          Message 4 of 5 , Jan 11, 2003
          • 0 Attachment
            Here is a xml idea. How can I extract to new document the name of
            people who is list notetab from this txt file.

            It's possible that clip question me which label I want to extract.
            Each <friend>...</friend> is a register for me. Thanks in advanced.

            friends.txt
            <friend>
            <name>Jody</name>
            <list>notetab</list>
            </friend>

            <friend>
            <name>Manuel</name>
            <list>clip</list>
            </friend>
            ---
            Cursos para ciegos
            Windows,Iexplorer,Outlook,Html,Word,Excel,Access
            ALTAS/BAJAS avisarte cursos en http://www.solotxt.com
          • Alan C.
            Hi Manuel123, If your text is consistently formatted on lines like it is down below the clip then this next clip will work/works. The clip uses the jump lines
            Message 5 of 5 , Jan 11, 2003
            • 0 Attachment
              Hi Manuel123,

              If your text is consistently formatted on lines like it is down below the clip then this next clip will work/works. The clip uses the jump lines clip command which was quick to assemble--there alternatively can be other, more elaborate, ways to get your desired data extracted.

              Perhaps someone else can help you with the wizards. Actually, meanwhile, you could just make a copy of this entire clip then, on the copied clip, name the header (as example) H="list_clip_xml_parse"

              and also alter these next two lines from

              ^!Set %collection%=list notetab^p^p
              ^!IfSame "^$GetSelection$" "<list>notetab</list>" make

              change the notetab in them to clip

              ^!Set %collection%=list clip^p^p
              ^!IfSame "^$GetSelection$" "<list>clip</list>" make

              With those three changes ie header and two lines altered, you'd then also have a clip to parse a different list ie list_clip_xml_parse

              H="list_ntab_xml_parse"
              ^!SetScreenUpdate OFF
              ^!Jump DOC_START
              ^!Set %collection%=list notetab^p^p
              :next_item
              ^!Find "<friend>" IST
              ^!IfError publish
              ^!Jump +2
              ^!Select EOL
              ^!IfSame "^$GetSelection$" "<list>notetab</list>" make
              ^!Goto next_item

              :make
              ^!Jump -1
              ^!Set %gotline%=^$GetLine$^%NL%
              ^!Append %collection%=^%gotline%
              ^!Jump +2
              ^!Goto next_item

              :publish
              ^!Menu File/New
              ^!InsertText ^%collection%
              ^!Replace "<name>" >> "^%EMPTY%" AWIS
              ^!Replace "</name>" >> "^%EMPTY%" AWIS
              ^!ClearVariable %collection%
              ; ---end clip---

              Each <friend>... is
              <name>Steve</name>
              <list>notetab</list>
              </friend>

              <friend>
              <name>Jody</name>
              <list>notetab</list>
              </friend>

              <friend>
              <name>Manuel</name>
              <list>clip</list>
              </friend>

              <friend>
              <name>Ralph</name>
              <list>notetab</list>
              </friend>

              --
              Alan.
            Your message has been successfully submitted and would be delivered to recipients shortly.