Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] re: Reading in external file, strip out duplicates, sort then save as ext. file

Expand Messages
  • Dave Gray
    ... This sounds like homework. ... You can do this (mostly) without perl, assuming you re allowed to modify the data file beforehand. The issue is that the
    Message 1 of 11 , Sep 29, 2005
    • 0 Attachment
      On 9/21/05, evh90210 <evh90210@...> wrote:
      > I have a file that I would like to read in then do the following:
      >
      > - Read in each line and remove any duplicate text with tags
      > - Sort the file so all tag IDs are in sequential order
      > - Save the results to a different file name.

      This sounds like homework.

      > Input:
      >
      > <tag id=1>Data 1</tag>
      > <tag id=2>Data 2</tag>
      > <tag id=3>Data 3</tag>
      > <tag id=4>Data 4</tag>
      > <tag id=2>Data 2</tag>
      > <tag id=5>Data 5</tag>
      > <tag id=13>Data 13</tag>
      > <tag id=6>Data 6</tag>
      > <tag id=7>Data 7</tag>
      > <tag id=8>Data 8</tag>
      > <tag id=9>Data 9</tag>
      > <tag id=13>Data 13</tag>
      > <tag id=10>Data 10</tag>
      > <tag id=11>Data 11</tag>
      > <tag id=12>Data 12</tag>
      > <tag id=14>Data 14</tag>
      > <tag id=15>Data 15</tag>
      > <tag id=16>Data 16</tag>
      > <tag id=17>Data 17</tag>
      >
      >
      > Output:
      >
      > <tag id=1>Data 1</tag>
      > <tag id=2>Data 2</tag>
      > <tag id=3>Data 3</tag>
      > <tag id=4>Data 4</tag>
      > <tag id=5>Data 5</tag>
      > <tag id=6>Data 6</tag>
      > <tag id=7>Data 7</tag>
      > <tag id=8>Data 8</tag>
      > <tag id=9>Data 9</tag>
      > <tag id=10>Data 10</tag>
      > <tag id=11>Data 11</tag>
      > <tag id=12>Data 12</tag>
      > <tag id=13>Data 13</tag>
      > <tag id=14>Data 14</tag>
      > <tag id=15>Data 15</tag>
      > <tag id=16>Data 16</tag>
      > <tag id=17>Data 17</tag>

      You can do this (mostly) without perl, assuming you're allowed to
      modify the data file beforehand. The issue is that the standard unix
      command sort will sort 1, 2, 10 as 10, 1, 2 so if you run

      perl -pi -e 's/=(\d+)/sprintf("=%02d", $1)/e' input

      to turn id=1 into id=01, then you can do (from the unix command prompt):

      sort input | uniq > output

      If your ids go above 100, you'll need to use %03d, or above 1000 %04d, etc etc
    Your message has been successfully submitted and would be delivered to recipients shortly.