Loading ...
Sorry, an error occurred while loading the content.

Need Help to Modify Perl Script (VERY LONG)

Expand Messages
  • Gorden-Ozgul, Patricia E
    I am a VERY novice perl-er. I m still learning grep/sed/nawk. I have code that was written by another that I need to modify. In the output file, I need to: #1
    Message 1 of 1 , Dec 29, 2000
    • 0 Attachment
      I am a VERY novice perl-er. I'm still learning grep/sed/nawk.
      I have code that was written by another that I need to modify.
      In the output file, I need to:

      #1 - remove any occurrence of empty records identified as
      <doc>
      <pmed>
      </pmed>
      </doc>
      (I think the appearance of a number of lines is producing this?)

      #2 - ensure EACH and EVERY record contains a line for EACH
      tag identified in the 'my @keys = aw/<tags>...' statement.

      For example, EVERY record should be constructed as follows:
      <doc>
      <pmed>
      <AU>blah blah; blah2 blah2</AU>
      <TI>blah blah blah</TI>
      <DP>blah blah blah</DP>
      <IS>....</IS>
      <TA>...
      <PG>...
      <IP>...
      <VI>...
      <MH>...
      <PT>...
      <UR>...
      </pmed>
      </doc>
      next record...

      NOTE: If there is no value present in the input file for a record's
      tag field, simply supply an empty string. IMPORTANT!!!!!!!!!!!!!!!!


      I am providing instructions on how I execute the code,
      the contents of the input file,
      the contents of the program file,
      the contents of the output file.


      Here are instruction on how to execute the code:
      cat test_in.dat | perl test.pl > test_out.dat

      (I 'cat' the input file and 'pipe' it through the perl script, directing
      the output to a file.)


      Here are the contents of the input file( test_in.dat )
      .....................................................................bof....
      .............................................................
      UI - 20199869
      AU - Ding YS
      AU - Logan J
      AU - Bermel R
      AU - Garza V
      AU - Rice O
      AU - Fowler JS
      AU - Volkow ND
      TI - Dopamine receptor-mediated regulation of striatal cholinergic
      activity:
      positron emission tomography studies with
      norchloro[18F]fluoroepibatidine
      [In Process Citation]
      LA - Eng
      ID - NS-15380/NS/NINDS
      ID - DA-06278/DA/NIDA
      DA - 20000327
      DP - 2000 Apr
      IS - 0022-3042
      TA - J Neurochem
      PG - 1514-21
      SB - M
      CY - UNITED STATES
      IP - 4
      VI - 74
      JC - JAV
      AA - AUTHOR
      AD - Chemistry Department, Brookhaven National Laboratory, Upton, New York
      11973-5000, USA. ding@...
      RO - O:099
      PMID- 0010737608
      MHDA- 2000/03/29 09:00
      EDAT- 2000/03/29 09:00
      SO - J Neurochem 2000 Apr;74(4):1514-21







      UI - 20199629
      AU - Volkow ND
      AU - Wang GJ
      AU - Fowler JS
      AU - Franceschi D
      AU - Thanos PK
      AU - Wong C
      AU - Gatley SJ
      AU - Ding YS
      AU - Molina P
      AU - Schlyer D
      AU - Alexoff D
      AU - Hitzemann R
      AU - Pappas N
      TI - Cocaine abusers show a blunted response to alcohol intoxication in
      limbic
      brain regions [In Process Citation]
      LA - Eng
      ID - DA06891/DA/NIDA
      DA - 20000327
      DP - 2000 Feb 11
      IS - 0024-3205
      TA - Life Sci
      PG - PL161-7
      SB - M
      SB - X
      CY - ENGLAND
      IP - 12
      VI - 66
      JC - L62
      AA - AUTHOR
      AD - Brookhaven National Laboratory, Upton, New York 11973, USA.
      volkow@...
      RO - O:099
      PMID- 0010737368
      MHDA- 2000/03/29 09:00
      EDAT- 2000/03/29 09:00
      SO - Life Sci 2000 Feb 11;66(12):PL161-7







      UI - 0
      AU - Shanklin J
      TI - Overexpression and Purification of the Escherichia coli Inner Membrane
      Enzyme Acyl-Acyl Carrier Protein Synthase in an Active Form.
      LA - ENG
      PT - JOURNAL ARTICLE
      DA - 20000327
      DP - 2000 Apr
      IS - 1046-5928
      TA - Protein Expr Purif
      PG - 355-360
      IP - 3
      VI - 18
      JC - BJV
      AD - Department of Biology, Brookhaven National Laboratory, Upton, New
      York,
      11786
      PMID- 0010733890
      PID - prep.2000.1206
      DOI - 10.1006/prep.2000.1206
      PST - ppublish
      MHDA- 2000/03/29 09:00
      EDAT- 2000/03/29 09:00
      SO - Protein Expr Purif 2000 Apr;18(3):355-360







      UI - 20196200
      AU - Volkow ND
      AU - Fowler JS
      TI - Addiction, a disease of compulsion and drive: involvement of the
      orbitofrontal cortex [In Process Citation]
      LA - Eng
      DA - 20000324
      DP - 2000 Mar
      IS - 1047-3211
      TA - Cereb Cortex
      PG - 318-25
      SB - M
      CY - UNITED STATES
      IP - 3
      VI - 10
      JC - BI9
      AA - AUTHOR
      AD - Medical and Chemistry Departments, Brookhaven National Laboratory,
      Upton,
      NY 11973 and Department of Psychiatry, SUNY-Stony Brook, Stony Brook,
      NY
      11794, USA.
      RO - O:099
      PMID- 0010731226
      UR - http://cercor.oupjournals.org/cgi/content/full/10/3/318
      UR - http://cercor.oupjournals.org/cgi/content/abstract/10/3/318
      PST - ppublish
      MHDA- 2000/03/24 09:00
      EDAT- 2000/03/24 09:00
      SO - Cereb Cortex 2000 Mar;10(3):318-25







      UI - 0
      AU - Hainfeld JF
      AU - Powell RD
      TI - New Frontiers in Gold Labeling.
      LA - ENG
      PT - JOURNAL ARTICLE
      DA - 20000321
      DP - 2000 Apr
      IS - 0022-1554
      TA - J Histochem Cytochem
      PG - 471-480
      IP - 4
      VI - 48
      JC - IDZ
      AD - Biology Department, Brookhaven National Laboratory, Upton, New York.
      PMID- 0010727288
      UR - http://www.jhc.org/cgi/content/full/48/4/471
      UR - http://www.jhc.org/cgi/content/abstract/48/4/471
      PST - ppublish
      SO - J Histochem Cytochem 2000 Apr;48(4):471-480







      UI - 0
      AU - Hainfeld JF
      AU - Robinson JM
      TI - New Frontiers in Gold Labeling. Symposium overview.
      LA - ENG
      PT - JOURNAL ARTICLE
      DA - 20000321
      DP - 2000 Apr
      IS - 0022-1554
      TA - J Histochem Cytochem
      PG - 459-460
      IP - 4
      VI - 48
      JC - IDZ
      AD - Brookhaven National Laboratory, Department of Biology, Upton, New
      York.
      PMID- 0010727286
      UR - http://www.jhc.org/cgi/content/full/48/4/459
      UR - http://www.jhc.org/cgi/content/abstract/48/4/459
      PST - ppublish
      SO - J Histochem Cytochem 2000 Apr;48(4):459-460







      UI - 0
      AU - Shu F
      AU - Ramakrishnan V
      AU - Schoenborn BP
      TI - Enhanced visibility of hydrogen atoms by neutron crystallography on
      fully
      deuterated myoglobin.
      LA - ENG
      PT - JOURNAL ARTICLE
      DA - 20000321
      DP - 2000 Mar 21
      IS - 0027-8424
      TA - Proc Natl Acad Sci U S A
      JC - PV3
      AD - Biology Department, Brookhaven National Laboratory, Upton, NY 11973;
      and
      Los Alamos National Laboratory, Los Alamos, NM 87545.
      PMID- 0010725379
      PID - 060024697
      DOI - 10.1073/pnas.060024697
      UR - http://www.pnas.org/cgi/content/full/060024697
      PST - aheadofprint
      SO - Proc Natl Acad Sci U S A 2000 Mar 21;:







      UI - 20185845
      AU - Hankes LV
      AU - Schmaeler M
      AU - Jansen CR
      AU - Brown RR
      TI - Vitamin effects on tryptophan-niacin metabolism in primary hepatoma
      patients [In Process Citation]
      LA - Eng
      DA - 20000317
      DP - 1999
      IS - 0065-2598
      TA - Adv Exp Med Biol
      PG - 283-7
      SB - M
      CY - UNITED STATES
      VI - 467
      JC - 2LU
      AA - AUTHOR
      AD - Biochemistry Div., Brookhaven National Laboratory, Upton New York
      11973,
      USA.
      RO - O:099
      PMID- 0010721067
      SO - Adv Exp Med Biol 1999;467:283-7







      UI - 20179210
      AU - Zema MJ
      TI - Gemfibrozil, nicotinic acid and combination therapy in patients with
      isolated hypoalphalipoproteinemia: a randomized, open-label, crossover
      study.
      LA - Eng
      MH - Adult
      MH - Aged
      MH - Aged, 80 and over
      MH - Antilipemic Agents/*therapeutic use
      MH - Atherosclerosis/blood/drug therapy
      MH - Comparative Study
      MH - Cross-Over Studies
      MH - Drug Therapy, Combination
      MH - Female
      MH - Gemfibrozil/*therapeutic use
      MH - Human
      MH - Hypolipoproteinemia/blood/*drug therapy
      MH - Lipoproteins, HDL Cholesterol/*blood
      MH - Lipoproteins, LDL Cholesterol/blood
      MH - Male
      MH - Middle Age
      MH - Niacin/*therapeutic use
      MH - Prospective Studies
      MH - Treatment Outcome
      RN - 0 (Antilipemic Agents)
      RN - 0 (Lipoproteins, HDL Cholesterol)
      RN - 0 (Lipoproteins, LDL Cholesterol)
      RN - 25812-30-0 (Gemfibrozil)
      RN - 59-67-6 (Niacin)
      PT - CLINICAL TRIAL
      PT - JOURNAL ARTICLE
      PT - RANDOMIZED CONTROLLED TRIAL
      DA - 20000324
      DP - 2000 Mar 1
      IS - 0735-1097
      TA - J Am Coll Cardiol
      PG - 640-6
      SB - A
      SB - M
      CY - UNITED STATES
      IP - 3
      VI - 35
      JC - H50
      AA - Author
      EM - 200006
      AD - Department of Medicine, Brookhaven Memorial Hospital Medical Center,
      Patchogue, New York 11772, USA.
      PMID- 0010716466
      MHDA- 2000/04/01 09:00
      EDAT- 2000/03/15 09:00
      SO - J Am Coll Cardiol 2000 Mar 1;35(3):640-6







      UI - 20177824
      AU - Lacks SA
      AU - Ayalew S
      AU - De La Campa AG
      AU - Greenberg B
      TI - Regulation of competence for genetic transformation in streptococcus
      pneumoniae: expression of dpnA, a late competence gene encoding a DNA
      methyltransferase of the DpnII restriction system [In Process
      Citation]
      LA - Eng
      DA - 20000323
      DP - 2000 Mar
      IS - 0950-382X
      TA - Mol Microbiol
      PG - 1089-98
      SB - M
      CY - ENGLAND
      IP - 5
      VI - 35
      JC - MOM
      AA - AUTHOR
      AD - Department of Biology, Brookhaven National Laboratory, Upton, NY
      11973,
      USA.
      RO - O:099
      PMID- 0010712690
      PID - mmi1777
      PST - ppublish
      MHDA- 2000/03/11 09:00
      EDAT- 2000/03/11 09:00
      SO - Mol Microbiol 2000 Mar;35(5):1089-98
      ............................................................................
      eof.........................................................................
      .


      Here are the contents of the program file ( test.pl):
      ..........................................................................
      bof.........................................................................
      ..
      #!/usr/bin/perl -w

      my $blank_lines = 0;

      my %records = ();
      my $last_key = '';

      my @keys = qw/AU TI DP IS TA PG IP VI MH PT UR/;

      while (<>) {
      if ($blank_lines == 4) { &print_last_record(); $blank_lines = 0;}

      if (/^[\t\s]*$/) { $blank_lines++; next;}

      chomp;

      if (/^([A-Z\s]+)\-\s(.+)/) {
      $last_key = $1; $value = $2;
      $last_key =~ s/\s+$//g;
      if (defined($record{$last_key})) {
      $record{$last_key} .= '; ' . $value;
      } else {
      $record{$last_key} = $value;
      }
      next;
      }
      $record{$last_key} .= $_;
      $record{$last_key} =~ s/\s+/ /g;
      }

      sub print_last_record () {
      print "<doc>\n<pmed>\n";
      foreach $key (@keys) {
      print "<$key>$record{$key}</$key>\n" if
      (defined($record{$key}));
      }
      %record = ();
      print "</pmed>\n</doc>\n\n";
      }
      ............................................................................
      .......eof...........................................................


      Here are the contents of the output file:
      <doc>
      <pmed>
      <AU>Ding YS; Logan J; Bermel R; Garza V; Rice O; Fowler JS; Volkow ND</AU>
      <TI>Dopamine receptor-mediated regulation of striatal cholinergic activity:
      positron emission tomography studies with norchloro[18F]fluoroepibatidine
      [In Process Citation]</TI>
      <DP>2000 Apr</DP>
      <IS>0022-3042</IS>
      <TA>J Neurochem</TA>
      <PG>1514-21</PG>
      <IP>4</IP>
      <VI>74</VI>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Volkow ND; Wang GJ; Fowler JS; Franceschi D; Thanos PK; Wong C; Gatley
      SJ; Ding YS; Molina P; Schlyer D; Alexoff D; Hitzemann R; Pappas N</AU>
      <TI>Cocaine abusers show a blunted response to alcohol intoxication in
      limbic brain regions [In Process Citation]</TI>
      <DP>2000 Feb 11</DP>
      <IS>0024-3205</IS>
      <TA>Life Sci</TA>
      <PG>PL161-7</PG>
      <IP>12</IP>
      <VI>66</VI>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Shanklin J</AU>
      <TI>Overexpression and Purification of the Escherichia coli Inner Membrane
      Enzyme Acyl-Acyl Carrier Protein Synthase in an Active Form.</TI>
      <DP>2000 Apr</DP>
      <IS>1046-5928</IS>
      <TA>Protein Expr Purif</TA>
      <PG>355-360</PG>
      <IP>3</IP>
      <VI>18</VI>
      <PT>JOURNAL ARTICLE</PT>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Volkow ND; Fowler JS</AU>
      <TI>Addiction, a disease of compulsion and drive: involvement of the
      orbitofrontal cortex [In Process Citation]</TI>
      <DP>2000 Mar</DP>
      <IS>1047-3211</IS>
      <TA>Cereb Cortex</TA>
      <PG>318-25</PG>
      <IP>3</IP>
      <VI>10</VI>
      <UR>http://cercor.oupjournals.org/cgi/content/full/10/3/318;
      http://cercor.oupjournals.org/cgi/content/abstract/10/3/318</UR>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Hainfeld JF; Powell RD</AU>
      <TI>New Frontiers in Gold Labeling.</TI>
      <DP>2000 Apr</DP>
      <IS>0022-1554</IS>
      <TA>J Histochem Cytochem</TA>
      <PG>471-480</PG>
      <IP>4</IP>
      <VI>48</VI>
      <PT>JOURNAL ARTICLE</PT>
      <UR>http://www.jhc.org/cgi/content/full/48/4/471;
      http://www.jhc.org/cgi/content/abstract/48/4/471</UR>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Hainfeld JF; Robinson JM</AU>
      <TI>New Frontiers in Gold Labeling. Symposium overview.</TI>
      <DP>2000 Apr</DP>
      <IS>0022-1554</IS>
      <TA>J Histochem Cytochem</TA>
      <PG>459-460</PG>
      <IP>4</IP>
      <VI>48</VI>
      <PT>JOURNAL ARTICLE</PT>
      <UR>http://www.jhc.org/cgi/content/full/48/4/459;
      http://www.jhc.org/cgi/content/abstract/48/4/459</UR>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Shu F; Ramakrishnan V; Schoenborn BP</AU>
      <TI>Enhanced visibility of hydrogen atoms by neutron crystallography on
      fully deuterated myoglobin.</TI>
      <DP>2000 Mar 21</DP>
      <IS>0027-8424</IS>
      <TA>Proc Natl Acad Sci U S A</TA>
      <PT>JOURNAL ARTICLE</PT>
      <UR>http://www.pnas.org/cgi/content/full/060024697</UR>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Hankes LV; Schmaeler M; Jansen CR; Brown RR</AU>
      <TI>Vitamin effects on tryptophan-niacin metabolism in primary hepatoma
      patients [In Process Citation]</TI>
      <DP>1999</DP>
      <IS>0065-2598</IS>
      <TA>Adv Exp Med Biol</TA>
      <PG>283-7</PG>
      <VI>467</VI>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Zema MJ</AU>
      <TI>Gemfibrozil, nicotinic acid and combination therapy in patients with
      isolated hypoalphalipoproteinemia: a randomized, open-label, crossover
      study.</TI>
      <DP>2000 Mar 1</DP>
      <IS>0735-1097</IS>
      <TA>J Am Coll Cardiol</TA>
      <PG>640-6</PG>
      <IP>3</IP>
      <VI>35</VI>
      <MH>Adult; Aged; Aged, 80 and over; Antilipemic Agents/*therapeutic use;
      Atherosclerosis/blood/drug therapy; Comparative Study; Cross-Over Studies;
      Drug Therapy, Combination; Female; Gemfibrozil/*therapeutic use; Human;
      Hypolipoproteinemia/blood/*drug therapy; Lipoproteins, HDL
      Cholesterol/*blood; Lipoproteins, LDL Cholesterol/blood; Male; Middle Age;
      Niacin/*therapeutic use; Prospective Studies; Treatment Outcome</MH>
      <PT>CLINICAL TRIAL; JOURNAL ARTICLE; RANDOMIZED CONTROLLED TRIAL</PT>
      </pmed>
      </doc>

      <doc>
      <pmed>
      </pmed>
      </doc>

      <doc>
      <pmed>
      <AU>Lacks SA; Ayalew S; De La Campa AG; Greenberg B</AU>
      <TI>Regulation of competence for genetic transformation in streptococcus
      pneumoniae: expression of dpnA, a late competence gene encoding a DNA
      methyltransferase of the DpnII restriction system [In Process Citation]</TI>
      <DP>2000 Mar</DP>
      <IS>0950-382X</IS>
      <TA>Mol Microbiol</TA>
      <PG>1089-98</PG>
      <IP>5</IP>
      <VI>35</VI>
      </pmed>
      </doc>

      ............................................................................
      ................................eof.........................................
      ..............


      Any and all help will be appreciated.
      Please email me directly.

      Thank you so much. And Happy New Year!

      __________________________________________________
      Pat Gorden-Ozgul BNL-ISD Systems
      gorden@... 631-344-5159
    Your message has been successfully submitted and would be delivered to recipients shortly.