Loading ...
Sorry, an error occurred while loading the content.

RE: [ARRL-LOTW] Wonder what they're choking on now?

Expand Messages
  • Dave AA6YQ
    As I understand it, a user submitted a digitally signed log that was ôcorruptedö in a way that causes the LotW Server to die û but without removing the
    Message 1 of 7 , Oct 6, 2013
    • 0 Attachment

      As I understand it, a user submitted a digitally signed log that was “corrupted” in a way that causes the LotW Server to die – but without removing the corrupted log from the processing queue.  The result was a continuous “process and die” loop.

       

      This corrupted log has been manually removed; the LotW Server defect that prevented it from discarding the offending log will be corrected.

       

            73,

       

                  Dave, AA6YQ

       

      From: ARRL-LOTW@yahoogroups.com [mailto:ARRL-LOTW@yahoogroups.com] On Behalf Of Jim Miller
      Sent: Sunday, October 06, 2013 12:26 PM
      To: arrl-lotw
      Subject: [ARRL-LOTW] Wonder what they're choking on now?

       

       

      The queue status shows quite a bit of bogging down. I wonder if something is wrong in the server or some hams have dumped their whole logs on the queue again.

       

      jim ab3cv


      No virus found in this message.
      Checked by AVG - www.avg.com
      Version: 10.0.1432 / Virus Database: 3222/6227 - Release Date: 10/06/13

    • Rick Murphy
      To further expand on Dave s comment - the log in question was quite large and was quite scrambled. This caused the LoTW log import processing to abort since it
      Message 2 of 7 , Oct 7, 2013
      • 0 Attachment
        To further expand on Dave's comment - the log in question was quite large and was quite scrambled. This caused the LoTW log import processing to abort since it couldn't understand the log. The LoTW server didn't crash, but the log reader process crashed based on not being able to decode the scrambled log. Other logs were able to be processed while this was going on.

        Since the log in question wasn't properly processed, it was re-entered into the end of the queue, which allowed other logs to work - until it got back to the front of the line, causing the same crash.

        Unfortunately, the person who submitted this log re-submitted it when it was apparently not processed, adding to the apparent backlog. It's good that the faulty files didn't block everyone, but it does seem to get stuck in the queue status page since that page records when it was originally submitted. As ARRL staff see these logs, they're now manually deleting them while they try to contact the source.

        Hopefully the station in question will stop using the computer that they used to submit these logs once ARRL contacts them as it's pretty likely not working properly based on how messed up the log appeared.

        For the tech geeks who care (most of you can stop reading now), here's the details on the symptom and what is probably causing it.

        LOTW reads a log and writes signed log records into a memory buffer. This will periodically need to get resized for large logs, meaning that it gets copied around to new heap areas, potentially multiple times. Once the entire log is signed, that buffer is read, compressed using zlib to save space, then written to disk (this is a 1.13 user, so it isn't uploaded but that's not important here). The compressed file has a hash appended to verify that it has not been changed since the .TQ8 was written.

        The log as received passes the integrity check and is decompressed. The resulting signed log has characters missing. It's an ADIF-like format with <TAG:size> format. In some cases, the tags are incomplete. In some cases, characters are switched: "1" becomes "E", etc. Software defects don't usually cause "<CALL:4>K1MU" to sometimes appear as "<CXLL:4>K1MU" and sometimes CALK1MU.  My opinion is that the multiple copying of the signed buffer around in memory is causing part of it to get lost and altered. This is most likely caused by bad RAM or a bad processor (i.e. bad capacitors on the motherboard).
        73,
            -Rick


        On Sun, Oct 6, 2013 at 2:00 PM, Dave AA6YQ <aa6yq@...> wrote:
         

        As I understand it, a user submitted a digitally signed log that was “corrupted” in a way that causes the LotW Server to die – but without removing the corrupted log from the processing queue.  The result was a continuous “process and die” loop.

         

        This corrupted log has been manually removed; the LotW Server defect that prevented it from discarding the offending log will be corrected.

         

              73,

         

                    Dave, AA6YQ

         

        From: ARRL-LOTW@yahoogroups.com [mailto:ARRL-LOTW@yahoogroups.com] On Behalf Of Jim Miller
        Sent: Sunday, October 06, 2013 12:26 PM
        To: arrl-lotw
        Subject: [ARRL-LOTW] Wonder what they're choking on now?

         

         

        The queue status shows quite a bit of bogging down. I wonder if something is wrong in the server or some hams have dumped their whole logs on the queue again.

         

        jim ab3cv


        No virus found in this message.
        Checked by AVG - www.avg.com
        Version: 10.0.1432 / Virus Database: 3222/6227 - Release Date: 10/06/13




        --
        Rick Murphy, CISSP-ISSAP, K1MU/4, Annandale VA USA
      • Peter Laws
        ... ** This is way down in the weeds, so most of you will probably want to tune out ... ** Thanks, Rick. Do you mean LOTW, the server, in the first paragraph
        Message 3 of 7 , Oct 7, 2013
        • 0 Attachment
          On Mon, Oct 7, 2013 at 5:12 AM, Rick Murphy <k1mu@...> wrote:

          >
          > LOTW reads a log and writes signed log records into a memory buffer. This will periodically need to get resized for large logs, meaning that it gets copied around to new heap areas, potentially multiple times. Once the entire log is signed, that buffer is read, compressed using zlib to save space, then written to disk (this is a 1.13 user, so it isn't uploaded but that's not important here). The compressed file has a hash appended to verify that it has not been changed since the .TQ8 was written.
          >
          > The log as received passes the integrity check and is decompressed. The resulting signed log has characters missing. It's an ADIF-like format with <TAG:size> format. In some cases, the tags are incomplete. In some cases, characters are switched: "1" becomes "E", etc. Software defects don't usually cause "<CALL:4>K1MU" to sometimes appear as "<CXLL:4>K1MU" and sometimes CALK1MU. My opinion is that the multiple copying of the signed buffer around in memory is causing part of it to get lost and altered. This is most likely caused by bad RAM or a bad processor (i.e. bad capacitors on the motherboard).
          >

          ** This is way down in the weeds, so most of you will probably want to
          tune out ... **

          Thanks, Rick. Do you mean LOTW, the server, in the first paragraph I
          quoted above or TQSL?

          We make hashes of the content of the message to insure that the
          message has not changed since it was signed and, from the signature,
          who created the file, right? Please correct me if I'm confused.
          Wait, I'm often confused so better correct me if I've got things
          wrong. :-)

          If I am correct, at what point does the file become corrupt? Before
          the hash is generated, I'm assuming, because if it was corrupted after
          the hash was made it wouldn't pass the authentication check, would it?

          If the bad ADIF was put out by the logging program, why does TQSL
          accept it in the first place?






          --
          Peter Laws | N5UWY | plaws plaws net | Travel by Train!
        • Rick Murphy
          ... The LoTW server. ... No, you re right. However, POST SIGNING the signed content becomes scrambled. However, it s a bit more subtle than that. Each QSO in a
          Message 4 of 7 , Oct 7, 2013
          • 0 Attachment
            On Mon, Oct 7, 2013 at 1:13 PM, Peter Laws <plaws0@...> wrote:
             

            On Mon, Oct 7, 2013 at 5:12 AM, Rick Murphy <k1mu@...> wrote:

            >
            > LOTW reads a log and writes signed log records into a memory buffer. This will periodically need to get resized for large logs, meaning that it gets copied around to new heap areas, potentially multiple times. Once the entire log is signed, that buffer is read, compressed using zlib to save space, then written to disk (this is a 1.13 user, so it isn't uploaded but that's not important here). The compressed file has a hash appended to verify that it has not been changed since the .TQ8 was written.
            >
            > The log as received passes the integrity check and is decompressed. The resulting signed log has characters missing. It's an ADIF-like format with <TAG:size> format. In some cases, the tags are incomplete. In some cases, characters are switched: "1" becomes "E", etc. Software defects don't usually cause "<CALL:4>K1MU" to sometimes appear as "<CXLL:4>K1MU" and sometimes CALK1MU. My opinion is that the multiple copying of the signed buffer around in memory is causing part of it to get lost and altered. This is most likely caused by bad RAM or a bad processor (i.e. bad capacitors on the motherboard).
            >

            ** This is way down in the weeds, so most of you will probably want to
            tune out ... **

            Thanks, Rick. Do you mean LOTW, the server, in the first paragraph I
            quoted above or TQSL?

            The LoTW server.
             
            We make hashes of the content of the message to insure that the
            message has not changed since it was signed and, from the signature,
            who created the file, right? Please correct me if I'm confused.
            Wait, I'm often confused so better correct me if I've got things
            wrong. :-)

            No, you're right. However, POST SIGNING the signed content becomes scrambled. However, it's a bit more subtle than that. 
            Each QSO in a signed log has it's own canonical form signed data which is individually signed.
             
            If I am correct, at what point does the file become corrupt? Before
            the hash is generated, I'm assuming, because if it was corrupted after
            the hash was made it wouldn't pass the authentication check, would it?

            It wasn't possible to verify anything about the file since there wasn't any valid content. Or, at least not much valid content.

            If the bad ADIF was put out by the logging program, why does TQSL
            accept it in the first place?

            There wasn't anything wrong with the input file. It's just that after TQSL signed the log, it was passed through a meat grinder before being stored for transmittal to LoTW. 

            Oh, and HQ have fixed the error that caused this log to abort. It'll now have everything rejected and not loop back to the end of the queue.
            73,
                -Rick
            -- 
            Rick Murphy, CISSP-ISSAP, K1MU/4, Annandale VA USA
          • Peter Laws
            ... OK, this is exactly where I was going. It just seemed to me that it would be pretty simple for the ingest algorithm to recognize a bogus file without
            Message 5 of 7 , Oct 7, 2013
            • 0 Attachment
              On Mon, Oct 7, 2013 at 2:20 PM, Rick Murphy <k1mu@...> wrote:


              >
              > Oh, and HQ have fixed the error that caused this log to abort. It'll now have everything rejected and not loop back to the end of the queue.

              OK, this is exactly where I was going. It just seemed to me that it
              would be pretty simple for the ingest algorithm to recognize a bogus
              file without getting all worked up about it. I'll be glad once TQSL
              2.x is well on the way and more attention can be turned to the LOTW
              processes themselves.

              Love the 2.x - got notified that there was a new config, clicked
              "here" to update, not another thought about it. Good stuff.

              --
              Peter Laws | N5UWY | plaws plaws net | Travel by Train!
            Your message has been successfully submitted and would be delivered to recipients shortly.