Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] PDF to TEXT

Expand Messages
  • Mr. Shawn H. Corey
    ... You could try PDF::API2 available on CPAN http://search.cpan.org/~areibens/PDF-API2-0.51/ It even has its own group:
    Message 1 of 19 , Jun 3, 2006
    View Source
    • 0 Attachment
      On Fri, 2006-02-06 at 23:36 -0700, Prasanna Goupal wrote:
      > Hi,
      >
      > I have to extract text from pdf file using perl. If anyone have any idea about this, then please reply to this mail.
      >
      > Also is there any unix command for the same?
      >
      > Thanks.
      > Regards,
      > Prasanna A. Goupal

      You could try PDF::API2 available on CPAN
      http://search.cpan.org/~areibens/PDF-API2-0.51/ It even has its own
      group: http://groups.yahoo.com/group/perl-text-pdf-modules/


      --
      __END__

      Just my 0.00000002 million dollars worth,
      --- Shawn

      "For the things we have to learn before we can do them, we learn by doing them."
      Aristotle

      * Perl tutorials at http://perlmonks.org/?node=Tutorials
      * A searchable perldoc is at http://perldoc.perl.org/
    • Prasanna Goupal
      Hi, There are 30000 PDF which have to convert to text. I got solution over this - ps2ascii unix command. Thanks for your reply. Regards, Prasanna A. Goupal ...
      Message 2 of 19 , Jun 3, 2006
      View Source
      • 0 Attachment
        Hi,

        There are 30000 PDF which have to convert to text.
        I got solution over this - ps2ascii unix command.

        Thanks for your reply.

        Regards,
        Prasanna A. Goupal

        Damien Carbery <daymobrew@...> wrote:
        --- In perl-beginner@yahoogroups.com, Prasanna Goupal
        <perl_developer@...> wrote:
        >
        > Hi,
        >
        > I have to extract text from pdf file using perl. If anyone have
        any idea about this, then please reply to this mail.
        >
        > Also is there any unix command for the same?
        >
        I had a very quick look at http://search.cpan.org (searched "pdf to
        text") but didn't find anything useful. I might not have looked hard
        enough.

        It might be easier to run Acrobat Reader and choose File/Save as Text.





        Unsubscribing info is here: http://help.yahoo.com/help/us/groups/groups-32.html



        SPONSORED LINKS
        Basic programming language C programming language Computer programming languages The c programming language C++ programming language Software programming language

        ---------------------------------
        YAHOO! GROUPS LINKS


        Visit your group "perl-beginner" on the web.

        To unsubscribe from this group, send an email to:
        perl-beginner-unsubscribe@yahoogroups.com

        Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.


        ---------------------------------




        __________________________________________________
        Do You Yahoo!?
        Tired of spam? Yahoo! Mail has the best spam protection around
        http://mail.yahoo.com

        [Non-text portions of this message have been removed]
      • Hetal Modi
        There is one software called PDFTOTEXT, you can download that software. And, use it in Unix. That does pretty good job. If you want to know more options, let
        Message 3 of 19 , Jun 3, 2006
        View Source
        • 0 Attachment
          There is one software called PDFTOTEXT, you can download that software. And, use it in Unix. That does pretty good job.

          If you want to know more options, let me know.

          -Hetal

          Prasanna Goupal <perl_developer@...> wrote:
          Hi,

          I have to extract text from pdf file using perl. If anyone have any idea about this, then please reply to this mail.

          Also is there any unix command for the same?

          Thanks.
          Regards,
          Prasanna A. Goupal



          __________________________________________________
          Do You Yahoo!?
          Tired of spam? Yahoo! Mail has the best spam protection around
          http://mail.yahoo.com

          [Non-text portions of this message have been removed]



          Unsubscribing info is here: http://help.yahoo.com/help/us/groups/groups-32.html



          SPONSORED LINKS
          Basic programming language C programming language Computer programming languages The c programming language C++ programming language Software programming language

          ---------------------------------
          YAHOO! GROUPS LINKS


          Visit your group "perl-beginner" on the web.

          To unsubscribe from this group, send an email to:
          perl-beginner-unsubscribe@yahoogroups.com

          Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.


          ---------------------------------




          __________________________________________________
          Do You Yahoo!?
          Tired of spam? Yahoo! Mail has the best spam protection around
          http://mail.yahoo.com

          [Non-text portions of this message have been removed]
        • Peter Dominey
          You may already have had a replay t this, if so sorry. Anyway, a pdf file, is just a text file so you can parse for text just as you would for any text file.
          Message 4 of 19 , Jun 4, 2006
          View Source
          • 0 Attachment
            You may already have had a replay t this, if so sorry.

            Anyway, a pdf file, is just a text file so you can parse for text just as you
            would for any text file.

            Thanks

            Peter


            On Saturday 03 June 2006 01:36, Prasanna Goupal wrote:
            > Hi,
            >
            > I have to extract text from pdf file using perl. If anyone have any idea
            > about this, then please reply to this mail.
            >
            > Also is there any unix command for the same?
            >
            > Thanks.
            > Regards,
            > Prasanna A. Goupal
            >
            >
            >
            > __________________________________________________
            > Do You Yahoo!?
            > Tired of spam? Yahoo! Mail has the best spam protection around
            > http://mail.yahoo.com
            >
            > [Non-text portions of this message have been removed]
            >
            >
            >
            >
            > Unsubscribing info is here:
            > http://help.yahoo.com/help/us/groups/groups-32.html Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
            > ----------------------------------------
            > Scanned for Viruses! mail.dominey.biz
            >
            >
            > ----------------------------------------
            > Scanned for Viruses! mail.dominey.biz

            --
            +-------------------------------------------------------------------+
            | P J Dominey |
            | Independent UNIX Contractor |
            | |
            | E-Mail: pdominey@... |
            | Web Site: www.pdrinformationsolutions.com (www.pdris.com) |
            | |
            | Tel: 817-488-5957 |
            | Yahoo IM: pdominey |
            | AOL IM: peterdominey |
            +-------------------------------------------------------------------+
            ----------------------------------------
            Scanned for Viruses! mail.dominey.biz


            ----------------------------------------
            Scanned for Viruses! mail.dominey.biz
          • Mike Southern
            Unless it s compressed.
            Message 5 of 19 , Jun 4, 2006
            View Source
            • 0 Attachment
              Unless it's compressed.

              On 6/4/06 9:49 PM, Peter Dominey at pdominey@... wrote:

              > You may already have had a replay t this, if so sorry.
              >
              > Anyway, a pdf file, is just a text file so you can parse for text just as you
              > would for any text file.
              >
              > Thanks
              >
              > Peter
            • RR MISHRA
              Hi Everybody, Can any body give me idea about the use of subroutines and bugs.If any body have the tutorials or sample examples about it then plz send me.I am
              Message 6 of 19 , Jun 5, 2006
              View Source
              • 0 Attachment
                Hi Everybody,
                Can any body give me idea about the use of subroutines and bugs.If any body have the tutorials or sample examples about it then plz send me.I am also want to know how subroutines and bugs are helpful in bioinformatics work.Plz guide me.I need your guidance.
                Thanx in advance.
                Regards.



                ---------------------------------
                Yahoo! India Answers: Share what you know. Learn something new Click here
                Send free SMS to your Friends on Mobile from your Yahoo! Messenger Download now

                [Non-text portions of this message have been removed]
              • DigiDoc
                I need to write some code that will take an ID from File-1 and see if it exists on File-2. If it does, then I want to write out the record from File-2 to
                Message 7 of 19 , Jun 5, 2006
                View Source
                • 0 Attachment
                  I need to write some code that will take an ID from File-1 and see if it
                  exists on File-2. If it does, then I want to write out the record from
                  File-2 to another file.

                  File-1 is just IDs, and File-2 is the ID plus a bunch of other fields
                  comma delimited. The ID however is variable length. The files are
                  ASCII. File-1 will only contain a small number of records (about 1K),
                  while File-2 will be about 100K records.

                  I have no idea how to do this (I'm a total Perl novice) and could
                  greatly use help.


                  Thanks!


                  ~~Kevin~


                  [Non-text portions of this message have been removed]
                • a_z0_9_blah
                  ... if it ... from ... fields ... are ... 1K), ... Could you show some sample lines from the first file and from the second file?
                  Message 8 of 19 , Jun 5, 2006
                  View Source
                  • 0 Attachment
                    --- In perl-beginner@yahoogroups.com, DigiDoc <DigiDoc@...> wrote:
                    >
                    > I need to write some code that will take an ID from File-1 and see
                    if it
                    > exists on File-2. If it does, then I want to write out the record
                    from
                    > File-2 to another file.
                    >
                    > File-1 is just IDs, and File-2 is the ID plus a bunch of other
                    fields
                    > comma delimited. The ID however is variable length. The files
                    are
                    > ASCII. File-1 will only contain a small number of records (about
                    1K),
                    > while File-2 will be about 100K records.
                    >
                    > I have no idea how to do this (I'm a total Perl novice) and could
                    > greatly use help.
                    >
                    >
                    > Thanks!
                    >
                    >
                    > ~~Kevin~
                    >


                    Could you show some sample lines from the first file and from the
                    second file?
                  • Charles K. Clarkson
                    ... Read the perlsub file in the Perl documentation. ... Goto http://rt.perl.org/perlbug/ and click on the Current Perl 5 Issues link. Charles K. Clarkson --
                    Message 9 of 19 , Jun 5, 2006
                    View Source
                    • 0 Attachment
                      RR MISHRA wrote:

                      :Can any body give me idea about the use of subroutines

                      Read the 'perlsub' file in the Perl documentation.


                      : and bugs.

                      Goto http://rt.perl.org/perlbug/ and click on the
                      Current Perl 5 Issues link.



                      Charles K. Clarkson
                      --
                      Mobile Homes Specialist
                      Free Market Advocate
                      Web Programmer

                      254 968-8328

                      Don't tread on my bandwidth. Trim your posts.
                    • DigiDoc
                      File-1 ... abc bg cd1234 File-2 ... abc,john,doe,1234,9999 addathk,kathy,smith,3453,5629 bg,joe,shmo,4532,5343 cd1234,jane,madle,5432,0932
                      Message 10 of 19 , Jun 5, 2006
                      View Source
                      • 0 Attachment
                        File-1
                        -------
                        abc
                        bg
                        cd1234


                        File-2
                        -------
                        abc,john,doe,1234,9999
                        addathk,kathy,smith,3453,5629
                        bg,joe,shmo,4532,5343
                        cd1234,jane,madle,5432,0932
                        dkk32,marge,hasbro,2345,1234


                        Note: Data layout in File-2 may vary from time to time, but will always
                        start with the ID.


                        ~~Kevin~


                        a_z0_9_blah wrote:
                        > --- In perl-beginner@yahoogroups.com, DigiDoc <DigiDoc@...> wrote:
                        >
                        >> I need to write some code that will take an ID from File-1 and see
                        >>
                        > if it
                        >
                        >> exists on File-2. If it does, then I want to write out the record
                        >>
                        > from
                        >
                        >> File-2 to another file.
                        >>
                        >> File-1 is just IDs, and File-2 is the ID plus a bunch of other
                        >>
                        > fields
                        >
                        >> comma delimited. The ID however is variable length. The files
                        >>
                        > are
                        >
                        >> ASCII. File-1 will only contain a small number of records (about
                        >>
                        > 1K),
                        >
                        >> while File-2 will be about 100K records.
                        >>
                        >> I have no idea how to do this (I'm a total Perl novice) and could
                        >> greatly use help.
                        >>
                        >>
                        >> Thanks!
                        >>
                        >>
                        >> ~~Kevin~
                        >>
                        >>
                        >
                        >
                        > Could you show some sample lines from the first file and from the
                        > second file?
                        >
                        >
                        >
                        >
                        >


                        [Non-text portions of this message have been removed]
                      • a_z0_9_blah
                        ... always ... see ... record ... (about ... could ... the ... You could try the following code on your sample data. If you will be massaging the data in
                        Message 11 of 19 , Jun 5, 2006
                        View Source
                        • 0 Attachment
                          --- In perl-beginner@yahoogroups.com, DigiDoc <DigiDoc@...> wrote:
                          >
                          >
                          > Note: Data layout in File-2 may vary from time to time, but will
                          always
                          > start with the ID.
                          >
                          >
                          > ~~Kevin~
                          >
                          >
                          > a_z0_9_blah wrote:
                          > > --- In perl-beginner@yahoogroups.com, DigiDoc <DigiDoc@> wrote:
                          > >
                          > >> I need to write some code that will take an ID from File-1 and
                          see
                          > >>
                          > > if it
                          > >
                          > >> exists on File-2. If it does, then I want to write out the
                          record
                          > >>
                          > > from
                          > >
                          > >> File-2 to another file.
                          > >>
                          > >> File-1 is just IDs, and File-2 is the ID plus a bunch of other
                          > >>
                          > > fields
                          > >
                          > >> comma delimited. The ID however is variable length. The files
                          > >>
                          > > are
                          > >
                          > >> ASCII. File-1 will only contain a small number of records
                          (about
                          > >>
                          > > 1K),
                          > >
                          > >> while File-2 will be about 100K records.
                          > >>
                          > >> I have no idea how to do this (I'm a total Perl novice) and
                          could
                          > >> greatly use help.
                          > >>
                          > >>
                          > >> Thanks!
                          > >>
                          > >>
                          > >> ~~Kevin~
                          > >>
                          > >>
                          > >
                          > >
                          > > Could you show some sample lines from the first file and from
                          the
                          > > second file?
                          > >
                          > File-1
                          > -------
                          > abc
                          > bg
                          > cd1234
                          >
                          >
                          > File-2
                          > -------
                          > abc,john,doe,1234,9999
                          > addathk,kathy,smith,3453,5629
                          > bg,joe,shmo,4532,5343
                          > cd1234,jane,madle,5432,0932
                          > dkk32,marge,hasbro,2345,1234
                          >

                          You could try the following code on your sample data.

                          If you will be 'massaging' the data in file-2,
                          you might consider treating your second
                          file as a database (using DBD::CSV).


                          #!/usr/bin/perl
                          use strict;
                          use warnings;

                          my %ids;

                          open my $id, "<", "o33.txt" or die "Unable to open o33.txt $!";

                          while (<$id>) {
                          chomp;
                          $ids{$_} = 1;
                          }

                          close $id or die $!;

                          open my $data, "<", "o44.txt" or die "Unable to open o44.txt $!";
                          open my $out, ">", "o55.txt" or die "Couldn't write results $!";

                          while (<$data>) {
                          my $key = (split /,/)[0];
                          if ($ids{$key}) {
                          print $out $_;
                          }
                          }

                          close $data or die $!;
                          close $out or die $!;
                        • DigiDoc
                          Great, thanks for the reply. I think I understand the majority of this code. I ll research the DBD::CSV as well. I definitely would not have known about that
                          Message 12 of 19 , Jun 5, 2006
                          View Source
                          • 0 Attachment
                            Great, thanks for the reply.

                            I think I understand the majority of this code. I'll research the
                            DBD::CSV as well. I definitely would not have known about that without
                            your help. THANK YOU!!!!!

                            I can read a fair amount of Perl code, but am just not up to quickly
                            putting code together yet (and probably not for some time). It takes me
                            forever.

                            This helps me out immensely.

                            Dumb question. If I read this correctly, you've hard coded the files,
                            how would I set it up to make these variables?

                            Thanks!

                            ~~Kevin~
                            >
                            >
                            > You could try the following code on your sample data.
                            >
                            > If you will be 'massaging' the data in file-2,
                            > you might consider treating your second
                            > file as a database (using DBD::CSV).
                            >
                            >
                            > #!/usr/bin/perl
                            > use strict;
                            > use warnings;
                            >
                            > my %ids;
                            >
                            > open my $id, "<", "o33.txt" or die "Unable to open o33.txt $!";
                            >
                            > while (<$id>) {
                            > chomp;
                            > $ids{$_} = 1;
                            > }
                            >
                            > close $id or die $!;
                            >
                            > open my $data, "<", "o44.txt" or die "Unable to open o44.txt $!";
                            > open my $out, ">", "o55.txt" or die "Couldn't write results $!";
                            >
                            > while (<$data>) {
                            > my $key = (split /,/)[0];
                            > if ($ids{$key}) {
                            > print $out $_;
                            > }
                            > }
                            >
                            > close $data or die $!;
                            > close $out or die $!;
                            >
                            >
                            >
                            >
                            >
                            >


                            [Non-text portions of this message have been removed]
                          • Alan_C
                            On Monday 05 June 2006 19:16, DigiDoc wrote: [ . . ] ...
                            Message 13 of 19 , Jun 6, 2006
                            View Source
                            • 0 Attachment
                              On Monday 05 June 2006 19:16, DigiDoc wrote:
                              [ . . ]
                              > Dumb question. If I read this correctly, you've hard coded the files,
                              > how would I set it up to make these variables?

                              http://groups.google.com/group/perl.beginners/browse_thread/thread/69865c81985c57f4/7d45c2b46cb7ba2b#7d45c2b46cb7ba2b

                              Take a look at the code in that it uses shift

                              A command line example:

                              perlscript file1

                              in that above command line example, file1 will be opened as filehandle OLD

                              if need to, can be set up to shift more than one file off the command line.

                              Further search topics be: shift, commandline, command line, ARGV, @ARGV,
                              $ARGV, $ARGV[0], $ARGV[1]

                              try search at <http://learn.perl.org/first-response> too likely turn up lots
                              yet even more.

                              --
                              Alan.
                            • DigiDoc
                              Thanks for all the help to this question. I ve got it working with the $ARGV stuff. I ve got more tweaks I want to make, but the base code you provided really
                              Message 14 of 19 , Jun 6, 2006
                              View Source
                              • 0 Attachment
                                Thanks for all the help to this question.

                                I've got it working with the $ARGV stuff. I've got more tweaks I want
                                to make, but the base code you provided really helps me out.

                                I'm thinking I'll want to make a GUI out of it now. I was doing some
                                research and found "The GUI Loft". It's a front end that uses
                                Win32::GUI. I haven't used it yet, but it seems like what I need. If
                                anyone has a better suggestion, please let me know.

                                I can't tell me how much time you've saved me, not to mention that I've
                                learned a few things as well. Code examples help me out a great deal, I
                                tend to learn faster from them.

                                Thanks!



                                [Non-text portions of this message have been removed]
                              • Ken Shail
                                ... From: DigiDoc ; DigiDoc To: perl-beginner@yahoogroups.com Sent: Tuesday, June 06, 2006 8:45 PM Subject: Re: [PBML] Re: File lookup? Thanks for all the help
                                Message 15 of 19 , Jun 6, 2006
                                View Source
                                • 0 Attachment
                                  ----- Original Message -----
                                  From: DigiDoc ; DigiDoc
                                  To: perl-beginner@yahoogroups.com
                                  Sent: Tuesday, June 06, 2006 8:45 PM
                                  Subject: Re: [PBML] Re: File lookup?


                                  Thanks for all the help to this question.

                                  I've got it working with the $ARGV stuff. I've got more tweaks I want
                                  to make, but the base code you provided really helps me out.

                                  I'm thinking I'll want to make a GUI out of it now. I was doing some
                                  research and found "The GUI Loft". It's a front end that uses
                                  Win32::GUI. I haven't used it yet, but it seems like what I need. If
                                  anyone has a better suggestion, please let me know.

                                  I can't tell me how much time you've saved me, not to mention that I've
                                  learned a few things as well. Code examples help me out a great deal, I
                                  tend to learn faster from them.

                                  Thanks!

                                  Perl Tk - works on Linux and windoze
                                  http://dc.pm.org/talks/tk/home.html
                                  http://www.perltk.org/index.php?option=com_frontpage&Itemid=1
                                  http://phaseit.net/claird/comp.lang.perl.tk/ptkFAQ.html
                                • DigiDoc
                                  Thanks for the links. I was looking at that, but couldn t find a nice front end that lets you work visually. The GUI Loft let s you do things visually, then
                                  Message 16 of 19 , Jun 6, 2006
                                  View Source
                                  • 0 Attachment
                                    Thanks for the links.

                                    I was looking at that, but couldn't find a nice front end that lets you
                                    work visually. The GUI Loft let's you do things visually, then
                                    generates the code for you. Is there something in tk that does that, or
                                    am I missing something? Or at the very least, some application that has
                                    some kind of code snippets that are easy to insert?

                                    My goal is to develop code as rapidly as possible since I'll have a lot
                                    to crank out.


                                    Ken Shail wrote:
                                    >
                                    >
                                    > ----- Original Message -----
                                    > From: DigiDoc ; DigiDoc
                                    > To: perl-beginner@yahoogroups.com
                                    > <mailto:perl-beginner%40yahoogroups.com>
                                    > Sent: Tuesday, June 06, 2006 8:45 PM
                                    > Subject: Re: [PBML] Re: File lookup?
                                    >
                                    > Thanks for all the help to this question.
                                    >
                                    > I've got it working with the $ARGV stuff. I've got more tweaks I want
                                    > to make, but the base code you provided really helps me out.
                                    >
                                    > I'm thinking I'll want to make a GUI out of it now. I was doing some
                                    > research and found "The GUI Loft". It's a front end that uses
                                    > Win32::GUI. I haven't used it yet, but it seems like what I need. If
                                    > anyone has a better suggestion, please let me know.
                                    >
                                    > I can't tell me how much time you've saved me, not to mention that I've
                                    > learned a few things as well. Code examples help me out a great deal, I
                                    > tend to learn faster from them.
                                    >
                                    > Thanks!
                                    >
                                    > Perl Tk - works on Linux and windoze
                                    > http://dc.pm.org/talks/tk/home.html <http://dc.pm.org/talks/tk/home.html>
                                    > http://www.perltk.org/index.php?option=com_frontpage&Itemid=1
                                    > <http://www.perltk.org/index.php?option=com_frontpage&Itemid=1>
                                    > http://phaseit.net/claird/comp.lang.perl.tk/ptkFAQ.html
                                    > <http://phaseit.net/claird/comp.lang.perl.tk/ptkFAQ.html>
                                    >
                                    >


                                    [Non-text portions of this message have been removed]
                                  • Alan_C
                                    ... [ Perl Tk - works on Linux and windoze ] (if you quote a bit of context, it makes it easier for me (+ probably some others) to track this thread and to
                                    Message 17 of 19 , Jun 6, 2006
                                    View Source
                                    • 0 Attachment
                                      On Tuesday 06 June 2006 13:24, DigiDoc wrote:
                                      > Thanks for the links.

                                      [ > > Perl Tk - works on Linux and windoze ]

                                      (if you quote a bit of context, it makes it easier for me (+ probably some
                                      others) to track this thread and to then offer any help if I have any help to
                                      offer.

                                      > I was looking at that, but couldn't find a nice front end that lets you
                                      > work visually. The GUI Loft let's you do things visually, then
                                      > generates the code for you. Is there something in tk that does that, or
                                      > am I missing something? Or at the very least, some application that has
                                      > some kind of code snippets that are easy to insert?
                                      >
                                      > My goal is to develop code as rapidly as possible since I'll have a lot
                                      > to crank out.

                                      There's WX Widgets. It has drag and drop (for the builder). It's cross
                                      platform ie Win, Linux, etc. There's article on WX Widgets in the
                                      latest "The Perl Review" (for subscribers) hey it's reasonable enough and I
                                      look forward to each issue's arrival. (I've no affiliation, just a
                                      subscriber).

                                      http://www.theperlreview.com/

                                      The summer issue has article on WX Widgets.

                                      Other than what I've shared, I know nothing about WX. I want to try it but
                                      it'll be + few weeks until I do due to other business currently.

                                      > > I'm thinking I'll want to make a GUI out of it now. I was doing some
                                      > > research and found "The GUI Loft". It's a front end that uses
                                      > > Win32::GUI. I haven't used it yet, but it seems like what I need. If
                                      > > anyone has a better suggestion, please let me know.

                                      --
                                      Alan.
                                    Your message has been successfully submitted and would be delivered to recipients shortly.