Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] changing line length of OCR scanned text

Expand Messages
  • Mike Breiding
    ... Hi Jeff, I did not know about Ctrl+J (Join lines) . This works! But, is there a way to automate the process so it joins each paragraph seperately? When
    Message 1 of 12 , Apr 23 7:11 AM
    View Source
    • 0 Attachment
      Jeff Scism wrote:
      > How about Selecting the section and using Ctrl+J (Join lines)?
      >
      > Mike Breiding wrote:
      >
      >> Greetings,
      >> I have OCR scanned docs where line lengths vary.
      >> I would like to have each paragraph be unbroken with.
      >> Is this a clip solution?
      Hi Jeff,
      I did not know about Ctrl+J (Join lines) .
      This works! But, is there a way to automate the process so it joins each
      paragraph seperately?
      When there is a block of text like below it is easy for me to
      distinguish paragraphs, but how would NT find them for processing I wonder?
      Thanks,
      -Mike

      "I have come to rely upon, or more aptly put, resorted to, are: (1)
      cemeteries and: (2)
      railroad right-of-ways.
      For instance, when I was in military service during World War II, I
      learned a good
      place to look for birds was in the bigger and older cemeteries of the
      larger towns and
      cities. Many of the larger cemeteries are like and oasis surrounded by
      all types of
      urbanization. The older ones usually attract birds because of the
      variety and stages of
      plant life.
      The older cemeteries are usually in or near the better residential
      sections which
      generally are landscaped with some types of trees and shrubs that
      provide food and cover
      for birds and other wildlife."
    • Don - HtmlFixIt.com
      ... select paragraph join jump next paragraph (may need to be with a jump select end) may need to see if it is a blank line as that isn t an end of paragraph
      Message 2 of 12 , Apr 23 7:15 AM
      View Source
      • 0 Attachment
        Mike Breiding wrote:
        > Jeff Scism wrote:
        >> How about Selecting the section and using Ctrl+J (Join lines)?
        >>

        select paragraph
        join
        jump next paragraph (may need to be with a jump select end)
        may need to see if it is a blank line as that isn't an end of paragraph

        with a little fiddling easy to do I think

        I use Control + J often.

        It works well on emailed content that gets line wrapped with hard
        returns inserted.
      • loro
        ... Ctrl+A Ctrl+J As long as there is at least one blank line between the blocks, that is. Lotta
        Message 3 of 12 , Apr 23 7:46 AM
        View Source
        • 0 Attachment
          Mike Breiding wrote:
          >Jeff Scism wrote:
          > > How about Selecting the section and using Ctrl+J (Join lines)?

          >This works! But, is there a way to automate the process so it joins each
          >paragraph seperately?

          Ctrl+A Ctrl+J

          As long as there is at least one blank line between the blocks, that is.

          Lotta
        • hsavage
          ... Mike, Here s a short clip that should solve your problem if it s formatted as in your example. ... H= FormatLines ^!Set %ww%=^$IsWordWrap$ ^!SetWordWrap 0
          Message 4 of 12 , Apr 23 7:46 AM
          View Source
          • 0 Attachment
            Mike Breiding wrote:
            > Greetings,
            >
            > I have OCR scanned docs where line lengths vary.
            > I would like to have each paragraph be unbroken with.
            >
            > Is this a clip solution?
            >
            > Thanks,
            > -Mike

            Mike,

            Here's a short clip that should solve your problem if it's formatted as
            in your example.

            -------------------
            H="FormatLines"
            ^!Set %ww%=^$IsWordWrap$
            ^!SetWordWrap 0
            ;
            ^!Replace "^p^p" >> "zxzx" TIWSA
            ^!Select ALL
            ^!Menu Modify/Lines/Join Lines
            ^!Replace "zxzx" >> "^p^p" TIWSA
            ;
            ^!SetWordWrap ^%ww%
            -------------------


            ·············································
            ºvº SL_114 created_2008.04.23_02.14.25

            Measure of SUCCESS:
            • At age 50 is...
            Having money.
            € hrs € hsavage € pobox € com
          • Mike Breiding
            ... This works, but only on docs with a blank line between paragraphs. I was afraid this might be a problem. Thanks for sending the clip! -Mike
            Message 5 of 12 , Apr 23 8:00 AM
            View Source
            • 0 Attachment
              hsavage wrote:
              > Mike Breiding wrote:
              > > Greetings,
              > >
              > > I have OCR scanned docs where line lengths vary.
              > > I would like to have each paragraph be unbroken with.
              > >
              > > Is this a clip solution?
              > >
              > > Thanks,
              > > -Mike
              >
              > Mike,
              >
              > Here's a short clip that should solve your problem if it's formatted as
              > in your example.
              > -------------------
              > H="FormatLines"
              > ^!Set %ww%=^$IsWordWrap$
              > ^!SetWordWrap 0
              > ;
              > ^!Replace "^p^p" >> "zxzx" TIWSA
              > ^!Select ALL
              > ^!Menu Modify/Lines/Join Lines
              > ^!Replace "zxzx" >> "^p^p" TIWSA
              > ;
              > ^!SetWordWrap ^%ww%
              > -------------------
              This works, but only on docs with a blank line between paragraphs. I was
              afraid this might be a problem.

              Thanks for sending the clip!
              -Mike
            • Mike Breiding
              ... Unfortunately, no blank lines between paragraphs. Thanks, -Mike
              Message 6 of 12 , Apr 23 8:01 AM
              View Source
              • 0 Attachment
                loro wrote:
                > Mike Breiding wrote:
                >
                >> Jeff Scism wrote:
                >>
                >>> How about Selecting the section and using Ctrl+J (Join lines)?
                >> his works! But, is there a way to automate the process so it joins each
                >> paragraph seperately?
                >>
                >
                > Ctrl+A Ctrl+J
                > As long as there is at least one blank line between the blocks, that is.
                > Lotta
                Unfortunately, no blank lines between paragraphs.
                Thanks,
                -Mike
              • Don - HtmlFixIt.com
                ... Mike can you send me a text file directly with a sample in it. What distinguishes a paragraph? A return followed by a capital in most/all cases?? Having a
                Message 7 of 12 , Apr 23 8:02 AM
                View Source
                • 0 Attachment
                  Mike Breiding wrote:
                  > loro wrote:
                  >> Mike Breiding wrote:
                  >>
                  >>> Jeff Scism wrote:
                  >>>
                  >>>> How about Selecting the section and using Ctrl+J (Join lines)?
                  >>> his works! But, is there a way to automate the process so it joins each
                  >>> paragraph seperately?
                  >>>
                  >> Ctrl+A Ctrl+J
                  >> As long as there is at least one blank line between the blocks, that is.
                  >> Lotta
                  > Unfortunately, no blank lines between paragraphs.
                  > Thanks,
                  > -Mike

                  Mike can you send me a text file directly with a sample in it.

                  What distinguishes a paragraph? A return followed by a capital in
                  most/all cases??

                  Having a look at the sample may do it.
                • Jeff Scism
                  If all your Paragraphs end ina period followed by the line break (.^P) you can have teh replace command rplace all .^P with .^P^P that makes two returns
                  Message 8 of 12 , Apr 23 8:15 AM
                  View Source
                  • 0 Attachment
                    If all your Paragraphs end ina period followed by the line break (.^P)
                    you can have teh replace command rplace all .^P with .^P^P that makes
                    two "returns" follow each paragraph, then Run CTRL+A and Ctrl+J to Join
                    them all.


                    ^!REPLACE ".^P" >> ".^P^P" BW
                    ^!KEYBOARD CTRL+A CTRL+J

                    The BW code at the end of the first line indicates that the search
                    starts from the BOTTOM of the doc and goes UP, and the W tells it to do
                    all it finds.


                    Jeff

                    Mike Breiding wrote:
                    >
                    > Jeff Scism wrote:
                    > > How about Selecting the section and using Ctrl+J (Join lines)?
                    > >
                    > > Mike Breiding wrote:
                    > >
                    > >> Greetings,
                    > >> I have OCR scanned docs where line lengths vary.
                    > >> I would like to have each paragraph be unbroken with.
                    > >> Is this a clip solution?
                    > Hi Jeff,
                    > I did not know about Ctrl+J (Join lines) .
                    > This works! But, is there a way to automate the process so it joins each
                    > paragraph seperately?
                    > When there is a block of text like below it is easy for me to
                    > distinguish paragraphs, but how would NT find them for processing I
                    > wonder?
                    > Thanks,
                    > -Mike
                    >
                    > "I have come to rely upon, or more aptly put, resorted to, are: (1)
                    > cemeteries and: (2)
                    > railroad right-of-ways.
                    > For instance, when I was in military service during World War II, I
                    > learned a good
                    > place to look for birds was in the bigger and older cemeteries of the
                    > larger towns and
                    > cities. Many of the larger cemeteries are like and oasis surrounded by
                    > all types of
                    > urbanization. The older ones usually attract birds because of the
                    > variety and stages of
                    > plant life.
                    > The older cemeteries are usually in or near the better residential
                    > sections which
                    > generally are landscaped with some types of trees and shrubs that
                    > provide food and cover
                    > for birds and other wildlife."
                    >
                    >


                    --


                    Jeffery G. Scism, IBSSG
                    ~~

                    "Proponents of each side are vying with determination to prove their ignorance is greater than the other."

                    President Andrew Jackson, discussing a bill going through the US Congress.



                    Visit http://ibssg.org/
                    For The Blacksheep website, MORE...

                    Putnam County Indiana Biographies and Obituaries
                    http://ingenweb.org/inputnam/bios/

                    Montgomery County Indiana Biographies and Obituaries
                    http://ingenweb.org/inmontgomery/bios/

                    Fountain County Indiana Biographies and Obituaries
                    http://ingenweb.org/infountain/vitals/bios/
                  • Mike Breiding
                    ... Ah-ha!! I missed and obvious S&R opportunity there. I did the S&R ( replace .^P with .^P^P ) and then ran the FormatLines clip from hsavage ( what is
                    Message 9 of 12 , Apr 23 8:38 AM
                    View Source
                    • 0 Attachment
                      Jeff Scism wrote:
                      > If all your Paragraphs end ina period followed by the line break (.^P)
                      > you can have teh replace command rplace all .^P with .^P^P that makes
                      > two "returns" follow each paragraph, then Run CTRL+A and Ctrl+J to Join
                      > them all.
                      >
                      >
                      > ^!REPLACE ".^P" >> ".^P^P" BW
                      > ^!KEYBOARD CTRL+A CTRL+J
                      >
                      > The BW code at the end of the first line indicates that the search
                      > starts from the BOTTOM of the doc and goes UP, and the W tells it to do
                      > all it finds. Jeff
                      Ah-ha!! I missed and obvious S&R opportunity there.
                      I did the S&R ( replace ".^P" with ".^P^P") and then ran the
                      FormatLines clip from "hsavage" ( what is your first name "hsavage"?)
                      and it got 90% of them.
                      There are some chopped paragraphs from the sloppy ORC, but I can get
                      those manually.

                      The OCRs I have are from all kinds of documents of all ages, fonts,
                      papers qualities, etc. So there is a mixed bag of how the docs ended up
                      being formatted. Some are going to be easy, some a pain in the a**.
                      With the solutions I have now and maybe more from Don on the way this
                      will hopefully get most of it cleaned up.

                      As always, thanks for the help!!
                      -Mike
                    • hsavage
                      ... Harvey -- ············································· ºvº SL_114 created_2008.04.23_02.14.25 Measure of
                      Message 10 of 12 , Apr 23 8:41 AM
                      View Source
                      • 0 Attachment
                        Mike Breiding wrote:
                        > "hsavage" ( what is your first name "hsavage"?)
                        >
                        > -Mike
                        >
                        Harvey

                        --
                        ·············································
                        ºvº SL_114 created_2008.04.23_02.14.25

                        Measure of SUCCESS:
                        • At age 50 is...
                        Having money.
                        € hrs € hsavage € pobox € com
                      Your message has been successfully submitted and would be delivered to recipients shortly.