Loading ...
Sorry, an error occurred while loading the content.
 

Re: [PBML] regular expression help.

Expand Messages
  • Alex Cordero
    ... form? ... not a ... No, unfortuntately I can t modify the output--it s a long story laced with office politics, sex scandals and high drama... Ok, I m
    Message 1 of 7 , May 1, 2006
      > Is there any possible way you can access the data in its original
      form?
      > That is, ask your boss. The format you present is human-readable,
      not a
      > bad thing unless you want to do further processing on it.

      No, unfortuntately I can't modify the output--it's a long story laced
      with office politics, sex scandals and high drama... Ok, I'm
      exagerating... but there are a lot of company politics involved and I
      can't modify the Perl program that produces the output.

      The other option I thought of was to try and remove the <LF>
      characters when a particular match is made--but I couldn't get that
      to work either as I fear that I don't understand how to use it.

      I've only been using Perl for two months so there's *a lot* I don't
      understand.

      The <LF> character is "/f" but can you remove it? What's the
      character option for?

      thanks,

      -Alex
    • Bob Kardell
      It may be a bit too late for this suggestion, but even if you can t get the original data, you can turn it into something usable. I hesitate to speak because
      Message 2 of 7 , May 5, 2006
        It may be a bit too late for this suggestion, but even
        if you can't get the original data, you can turn it
        into something usable. I hesitate to speak because
        most of the other people in the group are much better
        at Perl and could script this better, but I have two
        ideas:

        First, try to remove all the new lines (\n) with
        something like

        s/\n//g

        from the text and you would end up with one big
        string, then if the second lines are always the same
        format try to capture and replace them with

        s/(\d\d\d\d\d\d-\d\d)/$1\n/g

        and you should now have a new line only where you want
        it.

        Second, try to do a capture and replace on the first
        line with what I will call a record separator:

        s/(\d\d\d\d\d\d\d-\d\d\d\d)/=====Record
        Split=====\n$1/g

        Then do the same for the second line, but put the
        split after the capture this time:

        s/(\d\d\d\d\d\d-\d\d)/$1\n=====Record Split=====\n/g

        Now the format should be:

        =====Record Split=====
        2342343-4543 WRENCH, LARGE GENERAL USE
        345656-67-
        =====Record Split=====

        The search the file by paragraph instead of by line by
        setting the record separator as

        $/ = "=====Record Split=====";
        ## then run your match
        if (! m/wrench|hammer|nail/i){
        print OUTREPORT;
        }



        Just two ideas.

        Bob



        --- Alex Cordero <alex.cordero@...> wrote:

        > Thanks for he help Shawn.
        >
        > The number sequence is leftover data from a filtered
        > list. The code
        > I use is something like this:
        >
        > if (! m/wrench|hammer|nail/i){
        > print OUTREPORT;
        > }
        >
        > The simple "if" statement will print everything
        > except what's in the
        > query. But it will leave behind a row of numbers in
        > the second line.
        > So:
        >
        > INPUT ---
        > 2342343-4543 WRENCH, LARGE GENERAL USE
        > 345656-67- (<--this is wrapped)
        >
        > OUTPUT ---
        > 345656-67- (<..this line is remains)
        >
        > I tried using the continuing line command but I
        > couldn't get it to
        > work. Basically, my programs needs to suppress the
        > second line which
        > is wrapped.
        >
        > It will match "WRENCH" and suppress its output. But
        > it leaves behind
        > the wrapped portion.
        >
        >
        >
        >
        >
        > --- In perl-beginner@yahoogroups.com, "Mr. Shawn H.
        > Corey"
        > <shawnhcorey@...> wrote:
        > >
        > > On Mon, 2006-01-05 at 19:10 +0000, Alex Cordero
        > wrote:
        > > > I need a regular expression that will match a
        > series of number,
        > any
        > > > numbers then followed by blank spaces. For
        > example:
        > > >
        > > > "185-3459928060956-56- "
        > > >
        > > > what would the RE look like to match that
        > string?
        > >
        > > /\d+\-\d+\-\d+\-/;
        > >
        > > What do you want to do with this match? Are the
        > numbers of fixed
        > length?
        > > If so, /\d{3}\-\d{13}\-\d{2}\-/; A few more
        > details would give you
        > a
        > > better answer.
        > >
        > >
        > > --
        > > __END__
        > >
        > > Just my 0.00000002 million dollars worth,
        > > --- Shawn
        > >
        > > "For the things we have to learn before we can do
        > them, we learn
        > by doing them."
        > > Aristotle
        > >
        > > * Perl tutorials at
        > http://perlmonks.org/?node=Tutorials
        > > * A searchable perldoc is at
        > http://perldoc.perl.org/
        > >
        >
        >
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.