Loading ...
Sorry, an error occurred while loading the content.

Re: split up very large files

Expand Messages
  • hugo_paulissen <h.paulissen@facburfdcw.u
    ... files and ... now takes Coy, A similar question has been asked before. Read the post of Dick Gascoigne in the archive... (message 8255). Maybe you have to
    Message 1 of 5 , Dec 10, 2002
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "Lay, Coy" <coy_lay@n...> wrote:
      > Somebody help me get started on this script. I get very large log
      files and
      > I clean them up to import into a spreadsheet. My clean up script
      now takes

      Coy,

      A similar question has been asked before. Read the post of Dick
      Gascoigne in the archive... (message 8255). Maybe you have to find
      related messages as well. He is still active on the list, so maybe
      you can ask him for the complete clip and use that as a start.

      http://groups.yahoo.com/group/ntb-clips/message/8255

      BTW, I think it must be possible to make the clip (much) faster.

      Hugo
    • thefrank <tf@thefrank.com>
      Hi Coy, Since these are logfiles, I am thinking you may have server access. If this is the case it may be much simpler to split the files directly on your
      Message 2 of 5 , Dec 10, 2002
      • 0 Attachment
        Hi Coy,

        Since these are logfiles, I am thinking you may have server access.
        If this is the case it may be much simpler to split the files
        directly on your server, then download the split files into Note Tab
        to deal with. I have split huge logfiles and monster data files into
        manageable chunks in less than a second.

        The functions are OS related. The following is for a BSD platform:

        -----

        split [options] [infile] [outfile]

        For example, suppose bigfile contains 4000 lines.

        $ split bigfile smf
        will create four files: smfaa, smfab, smfac, smfad.
        Split options are shown below.

        Option Description
        -l n Specify the number of lines in each output file. For example, "-
        l 80" splits infile into files with 80 lines each. The default is
        1000. Note that the last file may have less than n lines. The -b and -
        l options should not be used together.
        Note: On some older system this option is specified as -n. For
        example, "split -100 myfile" splits myfile into files with 100 lines
        each.

        -b n[k | m] Specify the size of output files. For example, "-b 1024"
        splits infile into files of size 1024 bytes. Append k to specify the
        size in kilobytes or m to specify the size in megabytes. For example,
        "-b 1m" splits infile into 1 megabytes output files. The -b and -l
        options should not be used together.
        -a n Use n characters in the output filename suffix. For example, "-
        a 3" would append aaa, aab, ... to output filenames. The default is
        2. Note: This option is not available on all Unix systems.

        FreeBSD and OpenBSD provide an additional and extremely useful
        option, -p, that allows you to split a file at every line that
        matches a string or regular expression. See the special note for
        FreeBSD and OpenBSD users for more information.

        -----

        This should work for your requirement:

        split -l 10000 biglogfilename logchunk

        After running the command on a 100k lines logfile you should find an
        additional 10 files named

        logchunkaa
        logchunkab
        logchunkac
        logchunkad
        logchunkae
        logchunkaf
        logchunkag
        logchunkah
        logchunkai
        logchunkaj

        For other server platforms consult their helpfiles. They should have
        something similar.

        Regards,

        tf
        http://thefrank.com
      • Manuel123
        Here is a xml idea. How can I extract to new document the name of people who is list notetab from this txt file. It s possible that clip question me which
        Message 3 of 5 , Jan 11, 2003
        • 0 Attachment
          Here is a xml idea. How can I extract to new document the name of
          people who is list notetab from this txt file.

          It's possible that clip question me which label I want to extract.
          Each <friend>...</friend> is a register for me. Thanks in advanced.

          friends.txt
          <friend>
          <name>Jody</name>
          <list>notetab</list>
          </friend>

          <friend>
          <name>Manuel</name>
          <list>clip</list>
          </friend>
          ---
          Cursos para ciegos
          Windows,Iexplorer,Outlook,Html,Word,Excel,Access
          ALTAS/BAJAS avisarte cursos en http://www.solotxt.com
        • Alan C.
          Hi Manuel123, If your text is consistently formatted on lines like it is down below the clip then this next clip will work/works. The clip uses the jump lines
          Message 4 of 5 , Jan 11, 2003
          • 0 Attachment
            Hi Manuel123,

            If your text is consistently formatted on lines like it is down below the clip then this next clip will work/works. The clip uses the jump lines clip command which was quick to assemble--there alternatively can be other, more elaborate, ways to get your desired data extracted.

            Perhaps someone else can help you with the wizards. Actually, meanwhile, you could just make a copy of this entire clip then, on the copied clip, name the header (as example) H="list_clip_xml_parse"

            and also alter these next two lines from

            ^!Set %collection%=list notetab^p^p
            ^!IfSame "^$GetSelection$" "<list>notetab</list>" make

            change the notetab in them to clip

            ^!Set %collection%=list clip^p^p
            ^!IfSame "^$GetSelection$" "<list>clip</list>" make

            With those three changes ie header and two lines altered, you'd then also have a clip to parse a different list ie list_clip_xml_parse

            H="list_ntab_xml_parse"
            ^!SetScreenUpdate OFF
            ^!Jump DOC_START
            ^!Set %collection%=list notetab^p^p
            :next_item
            ^!Find "<friend>" IST
            ^!IfError publish
            ^!Jump +2
            ^!Select EOL
            ^!IfSame "^$GetSelection$" "<list>notetab</list>" make
            ^!Goto next_item

            :make
            ^!Jump -1
            ^!Set %gotline%=^$GetLine$^%NL%
            ^!Append %collection%=^%gotline%
            ^!Jump +2
            ^!Goto next_item

            :publish
            ^!Menu File/New
            ^!InsertText ^%collection%
            ^!Replace "<name>" >> "^%EMPTY%" AWIS
            ^!Replace "</name>" >> "^%EMPTY%" AWIS
            ^!ClearVariable %collection%
            ; ---end clip---

            Each <friend>... is
            <name>Steve</name>
            <list>notetab</list>
            </friend>

            <friend>
            <name>Jody</name>
            <list>notetab</list>
            </friend>

            <friend>
            <name>Manuel</name>
            <list>clip</list>
            </friend>

            <friend>
            <name>Ralph</name>
            <list>notetab</list>
            </friend>

            --
            Alan.
          Your message has been successfully submitted and would be delivered to recipients shortly.