Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] changing line length of OCR scanned text

Expand Messages
  • Jeff Scism
    How about Selecting the section and using Ctrl+J (Join lines)? ... -- Jeffery G. Scism, IBSSG ~~ Proponents of each side are vying with determination to prove
    Message 1 of 12 , Apr 23 5:44 AM
    • 0 Attachment
      How about Selecting the section and using Ctrl+J (Join lines)?



      Mike Breiding wrote:
      >
      >
      > Greetings,
      >
      > I have OCR scanned docs where line lengths vary.
      > I would like to have each paragraph be unbroken with.
      >
      > Is this a clip solution?
      >
      > Thanks,
      > -Mike
      > ===============
      > SAMPLE 1
      > West Virginia's allotment from the Land and Water Conservation Fund
      > (LWCF) the Bureau of Outdoor Recreation (BOR) is disbursed at a ratio of
      > approximately 60% for state-operated projects. W for community
      > recreational
      > facilities.
      >
      > SAMPLE 2
      > During 50 years of bird watching I have seen the gradual diminishing and
      > at times
      > cataclysmic destruction of the natural environment[consistent
      > destruction of the habitat]. Aside from all the poisons spewed into the
      > atmosphere and waters plus the havoc that has been wreaked upon the
      > landscape, [and consistent destruction of habitat]there has been a
      > pervasive shrinking of living space for wild creatures.
      >
      >


      --


      Jeffery G. Scism, IBSSG
      ~~

      "Proponents of each side are vying with determination to prove their ignorance is greater than the other."

      President Andrew Jackson, discussing a bill going through the US Congress.



      Visit http://ibssg.org/
      For The Blacksheep website, MORE...

      Putnam County Indiana Biographies and Obituaries
      http://ingenweb.org/inputnam/bios/

      Montgomery County Indiana Biographies and Obituaries
      http://ingenweb.org/inmontgomery/bios/

      Fountain County Indiana Biographies and Obituaries
      http://ingenweb.org/infountain/vitals/bios/
    • Mike Breiding
      ... Hi Jeff, I did not know about Ctrl+J (Join lines) . This works! But, is there a way to automate the process so it joins each paragraph seperately? When
      Message 2 of 12 , Apr 23 7:11 AM
      • 0 Attachment
        Jeff Scism wrote:
        > How about Selecting the section and using Ctrl+J (Join lines)?
        >
        > Mike Breiding wrote:
        >
        >> Greetings,
        >> I have OCR scanned docs where line lengths vary.
        >> I would like to have each paragraph be unbroken with.
        >> Is this a clip solution?
        Hi Jeff,
        I did not know about Ctrl+J (Join lines) .
        This works! But, is there a way to automate the process so it joins each
        paragraph seperately?
        When there is a block of text like below it is easy for me to
        distinguish paragraphs, but how would NT find them for processing I wonder?
        Thanks,
        -Mike

        "I have come to rely upon, or more aptly put, resorted to, are: (1)
        cemeteries and: (2)
        railroad right-of-ways.
        For instance, when I was in military service during World War II, I
        learned a good
        place to look for birds was in the bigger and older cemeteries of the
        larger towns and
        cities. Many of the larger cemeteries are like and oasis surrounded by
        all types of
        urbanization. The older ones usually attract birds because of the
        variety and stages of
        plant life.
        The older cemeteries are usually in or near the better residential
        sections which
        generally are landscaped with some types of trees and shrubs that
        provide food and cover
        for birds and other wildlife."
      • Don - HtmlFixIt.com
        ... select paragraph join jump next paragraph (may need to be with a jump select end) may need to see if it is a blank line as that isn t an end of paragraph
        Message 3 of 12 , Apr 23 7:15 AM
        • 0 Attachment
          Mike Breiding wrote:
          > Jeff Scism wrote:
          >> How about Selecting the section and using Ctrl+J (Join lines)?
          >>

          select paragraph
          join
          jump next paragraph (may need to be with a jump select end)
          may need to see if it is a blank line as that isn't an end of paragraph

          with a little fiddling easy to do I think

          I use Control + J often.

          It works well on emailed content that gets line wrapped with hard
          returns inserted.
        • loro
          ... Ctrl+A Ctrl+J As long as there is at least one blank line between the blocks, that is. Lotta
          Message 4 of 12 , Apr 23 7:46 AM
          • 0 Attachment
            Mike Breiding wrote:
            >Jeff Scism wrote:
            > > How about Selecting the section and using Ctrl+J (Join lines)?

            >This works! But, is there a way to automate the process so it joins each
            >paragraph seperately?

            Ctrl+A Ctrl+J

            As long as there is at least one blank line between the blocks, that is.

            Lotta
          • hsavage
            ... Mike, Here s a short clip that should solve your problem if it s formatted as in your example. ... H= FormatLines ^!Set %ww%=^$IsWordWrap$ ^!SetWordWrap 0
            Message 5 of 12 , Apr 23 7:46 AM
            • 0 Attachment
              Mike Breiding wrote:
              > Greetings,
              >
              > I have OCR scanned docs where line lengths vary.
              > I would like to have each paragraph be unbroken with.
              >
              > Is this a clip solution?
              >
              > Thanks,
              > -Mike

              Mike,

              Here's a short clip that should solve your problem if it's formatted as
              in your example.

              -------------------
              H="FormatLines"
              ^!Set %ww%=^$IsWordWrap$
              ^!SetWordWrap 0
              ;
              ^!Replace "^p^p" >> "zxzx" TIWSA
              ^!Select ALL
              ^!Menu Modify/Lines/Join Lines
              ^!Replace "zxzx" >> "^p^p" TIWSA
              ;
              ^!SetWordWrap ^%ww%
              -------------------


              ·············································
              ºvº SL_114 created_2008.04.23_02.14.25

              Measure of SUCCESS:
              • At age 50 is...
              Having money.
              € hrs € hsavage € pobox € com
            • Mike Breiding
              ... This works, but only on docs with a blank line between paragraphs. I was afraid this might be a problem. Thanks for sending the clip! -Mike
              Message 6 of 12 , Apr 23 8:00 AM
              • 0 Attachment
                hsavage wrote:
                > Mike Breiding wrote:
                > > Greetings,
                > >
                > > I have OCR scanned docs where line lengths vary.
                > > I would like to have each paragraph be unbroken with.
                > >
                > > Is this a clip solution?
                > >
                > > Thanks,
                > > -Mike
                >
                > Mike,
                >
                > Here's a short clip that should solve your problem if it's formatted as
                > in your example.
                > -------------------
                > H="FormatLines"
                > ^!Set %ww%=^$IsWordWrap$
                > ^!SetWordWrap 0
                > ;
                > ^!Replace "^p^p" >> "zxzx" TIWSA
                > ^!Select ALL
                > ^!Menu Modify/Lines/Join Lines
                > ^!Replace "zxzx" >> "^p^p" TIWSA
                > ;
                > ^!SetWordWrap ^%ww%
                > -------------------
                This works, but only on docs with a blank line between paragraphs. I was
                afraid this might be a problem.

                Thanks for sending the clip!
                -Mike
              • Mike Breiding
                ... Unfortunately, no blank lines between paragraphs. Thanks, -Mike
                Message 7 of 12 , Apr 23 8:01 AM
                • 0 Attachment
                  loro wrote:
                  > Mike Breiding wrote:
                  >
                  >> Jeff Scism wrote:
                  >>
                  >>> How about Selecting the section and using Ctrl+J (Join lines)?
                  >> his works! But, is there a way to automate the process so it joins each
                  >> paragraph seperately?
                  >>
                  >
                  > Ctrl+A Ctrl+J
                  > As long as there is at least one blank line between the blocks, that is.
                  > Lotta
                  Unfortunately, no blank lines between paragraphs.
                  Thanks,
                  -Mike
                • Don - HtmlFixIt.com
                  ... Mike can you send me a text file directly with a sample in it. What distinguishes a paragraph? A return followed by a capital in most/all cases?? Having a
                  Message 8 of 12 , Apr 23 8:02 AM
                  • 0 Attachment
                    Mike Breiding wrote:
                    > loro wrote:
                    >> Mike Breiding wrote:
                    >>
                    >>> Jeff Scism wrote:
                    >>>
                    >>>> How about Selecting the section and using Ctrl+J (Join lines)?
                    >>> his works! But, is there a way to automate the process so it joins each
                    >>> paragraph seperately?
                    >>>
                    >> Ctrl+A Ctrl+J
                    >> As long as there is at least one blank line between the blocks, that is.
                    >> Lotta
                    > Unfortunately, no blank lines between paragraphs.
                    > Thanks,
                    > -Mike

                    Mike can you send me a text file directly with a sample in it.

                    What distinguishes a paragraph? A return followed by a capital in
                    most/all cases??

                    Having a look at the sample may do it.
                  • Jeff Scism
                    If all your Paragraphs end ina period followed by the line break (.^P) you can have teh replace command rplace all .^P with .^P^P that makes two returns
                    Message 9 of 12 , Apr 23 8:15 AM
                    • 0 Attachment
                      If all your Paragraphs end ina period followed by the line break (.^P)
                      you can have teh replace command rplace all .^P with .^P^P that makes
                      two "returns" follow each paragraph, then Run CTRL+A and Ctrl+J to Join
                      them all.


                      ^!REPLACE ".^P" >> ".^P^P" BW
                      ^!KEYBOARD CTRL+A CTRL+J

                      The BW code at the end of the first line indicates that the search
                      starts from the BOTTOM of the doc and goes UP, and the W tells it to do
                      all it finds.


                      Jeff

                      Mike Breiding wrote:
                      >
                      > Jeff Scism wrote:
                      > > How about Selecting the section and using Ctrl+J (Join lines)?
                      > >
                      > > Mike Breiding wrote:
                      > >
                      > >> Greetings,
                      > >> I have OCR scanned docs where line lengths vary.
                      > >> I would like to have each paragraph be unbroken with.
                      > >> Is this a clip solution?
                      > Hi Jeff,
                      > I did not know about Ctrl+J (Join lines) .
                      > This works! But, is there a way to automate the process so it joins each
                      > paragraph seperately?
                      > When there is a block of text like below it is easy for me to
                      > distinguish paragraphs, but how would NT find them for processing I
                      > wonder?
                      > Thanks,
                      > -Mike
                      >
                      > "I have come to rely upon, or more aptly put, resorted to, are: (1)
                      > cemeteries and: (2)
                      > railroad right-of-ways.
                      > For instance, when I was in military service during World War II, I
                      > learned a good
                      > place to look for birds was in the bigger and older cemeteries of the
                      > larger towns and
                      > cities. Many of the larger cemeteries are like and oasis surrounded by
                      > all types of
                      > urbanization. The older ones usually attract birds because of the
                      > variety and stages of
                      > plant life.
                      > The older cemeteries are usually in or near the better residential
                      > sections which
                      > generally are landscaped with some types of trees and shrubs that
                      > provide food and cover
                      > for birds and other wildlife."
                      >
                      >


                      --


                      Jeffery G. Scism, IBSSG
                      ~~

                      "Proponents of each side are vying with determination to prove their ignorance is greater than the other."

                      President Andrew Jackson, discussing a bill going through the US Congress.



                      Visit http://ibssg.org/
                      For The Blacksheep website, MORE...

                      Putnam County Indiana Biographies and Obituaries
                      http://ingenweb.org/inputnam/bios/

                      Montgomery County Indiana Biographies and Obituaries
                      http://ingenweb.org/inmontgomery/bios/

                      Fountain County Indiana Biographies and Obituaries
                      http://ingenweb.org/infountain/vitals/bios/
                    • Mike Breiding
                      ... Ah-ha!! I missed and obvious S&R opportunity there. I did the S&R ( replace .^P with .^P^P ) and then ran the FormatLines clip from hsavage ( what is
                      Message 10 of 12 , Apr 23 8:38 AM
                      • 0 Attachment
                        Jeff Scism wrote:
                        > If all your Paragraphs end ina period followed by the line break (.^P)
                        > you can have teh replace command rplace all .^P with .^P^P that makes
                        > two "returns" follow each paragraph, then Run CTRL+A and Ctrl+J to Join
                        > them all.
                        >
                        >
                        > ^!REPLACE ".^P" >> ".^P^P" BW
                        > ^!KEYBOARD CTRL+A CTRL+J
                        >
                        > The BW code at the end of the first line indicates that the search
                        > starts from the BOTTOM of the doc and goes UP, and the W tells it to do
                        > all it finds. Jeff
                        Ah-ha!! I missed and obvious S&R opportunity there.
                        I did the S&R ( replace ".^P" with ".^P^P") and then ran the
                        FormatLines clip from "hsavage" ( what is your first name "hsavage"?)
                        and it got 90% of them.
                        There are some chopped paragraphs from the sloppy ORC, but I can get
                        those manually.

                        The OCRs I have are from all kinds of documents of all ages, fonts,
                        papers qualities, etc. So there is a mixed bag of how the docs ended up
                        being formatted. Some are going to be easy, some a pain in the a**.
                        With the solutions I have now and maybe more from Don on the way this
                        will hopefully get most of it cleaned up.

                        As always, thanks for the help!!
                        -Mike
                      • hsavage
                        ... Harvey -- ············································· ºvº SL_114 created_2008.04.23_02.14.25 Measure of
                        Message 11 of 12 , Apr 23 8:41 AM
                        • 0 Attachment
                          Mike Breiding wrote:
                          > "hsavage" ( what is your first name "hsavage"?)
                          >
                          > -Mike
                          >
                          Harvey

                          --
                          ·············································
                          ºvº SL_114 created_2008.04.23_02.14.25

                          Measure of SUCCESS:
                          • At age 50 is...
                          Having money.
                          € hrs € hsavage € pobox € com
                        Your message has been successfully submitted and would be delivered to recipients shortly.