Loading ...
Sorry, an error occurred while loading the content.

SlugOS/BE 4.8-beta large file corruption

Expand Messages
  • Jon
    Hi folks I have two slugs: slug1 has been running uNSLUng for about 9 months with 2 300GB drives, Samba-ed out for backup and media storage across my home
    Message 1 of 7 , Mar 4, 2008
    • 0 Attachment
      Hi folks
      I have two slugs: slug1 has been running uNSLUng for about 9 months
      with 2 300GB drives, Samba-ed out for backup and media storage across
      my home network. Slug2 is relatively new, running SlugOS/BE 4.8-beta
      off a 2GB USB stick with a 500GB drive for storage. All drives are
      ext3 and both slugs are de-underclocked. Both were bought 2nd hand so
      I can't be sure of their actual ages.

      I've been copying media across from slug1 to slug2 and running md5sum
      on the files, just cos I'm paranoid that way. And I've noticed that
      most of the large files are reporting different checksums on the
      source file and the copy! Small stuff (e.g. MP3 files) seems to be OK,
      but larger stuff (100MB+ video files) are getting corrupted about 80%
      of the time.

      I did my initial copies via drag-and-drop on a PC (so Samba to the PC
      & immediately Samba out again). Lots of md5 differences.

      Second attempt was sharing out the slug2 drive with nfs, mounting it
      R/W on slug1 and copying onto it. Still lots of md5 differences. (I
      can't find the nfs client on SlugOS, but that's for a different post...)

      Third attempt was plugging a USB hub into slug2 so that it could host
      both the drive from slug1 and the slug2 drive, then copy straight
      across. STILL lots of md5 differences.

      In all cases, I'm running md5sum at the command prompt, logged into
      the machine that the drive is physically attached to. The md5 for the
      source and destination file is frequently different but it's always
      consistent - if I run md5sum again I get the same result. However if I
      copy the file again I get a different md5 - after a couple of attempts
      I can usually get a copy which md5s the same as the source file.

      To approach the problem from a different angle, I pointed my PC's
      BitTorrent client at the Samba shares from the two slugs and
      downloaded some 300MB torrents onto the slugs. After the torrent
      completes, I select "Force Recheck" to check that the CRC for the
      downloaded data matches the checksums in the torrent file. On slug1
      (the uNSLUng system) this verifies 100%. On slug2 (the SlugOS/BE
      system) I'll always get about half a dozen blocks failing the CRC
      check - and each time I recheck, they're different blocks, even though
      the data on the slug's drive hasn't changed.

      To me, this looks like data corruption on both write and read. The
      fact that a directly connected drive also shows the errors seems to
      show it isn't the fault of Samba or nfs. The fact that the md5 results
      are consistent for a given file seems to show it isn't a drive problem.

      Has anybody else seen anything like this?
      Are there some SlugOS settings I should be tweaking?
      Or log files I should be looking in?
      Or something obvious I've done wrong?

      Apologies for the long post, but better to post all the info up front
      than to have it slowly extracted from me :-)

      Many thanks

      Jon
    • Thomas Reitmayr
      Hi Jon, I saw something very similar with SlugOS/BE 4.9 and SlugOS/LE(EABI) 4.9 which right now makes my slug pretty much unusable as a file server. You can
      Message 2 of 7 , Mar 4, 2008
      • 0 Attachment
        Hi Jon,
        I saw something very similar with SlugOS/BE 4.9 and SlugOS/LE(EABI) 4.9 which right now makes my slug pretty much unusable as a file server. You can find my descriptions of the symptoms at
          http://tech.groups.yahoo.com/group/nslu2-linux/message/20857
        for the LE system, and for the BE system at
          http://tech.groups.yahoo.com/group/nslu2-linux/message/21083

        Please also check your syslog (command "dmesg" on your slug) to see if you get the same errors. As described in the emails I suspect that there is a problem in the ethernet driver. I found similar reports from a few months ago and I am not sure whether the issue was addressed back then or it might be still in the code. From looking at the ethernet driver I have no big hope to find the problem myself... [The issue could be buried somewhere else too, of course.]
        Regards,
        -Thomas


        ----- Ursprüngliche Mail ----
        Von: Jon <jon_canada100@...>
        An: nslu2-linux@yahoogroups.com
        Gesendet: Dienstag, den 4. März 2008, 20:08:53 Uhr
        Betreff: [nslu2-linux] SlugOS/BE 4.8-beta large file corruption

        Hi folks
        I have two slugs: slug1 has been running uNSLUng for about 9 months
        with 2 300GB drives, Samba-ed out for backup and media storage across
        my home network. Slug2 is relatively new, running SlugOS/BE 4.8-beta
        off a 2GB USB stick with a 500GB drive for storage. All drives are
        ext3 and both slugs are de-underclocked. Both were bought 2nd hand so
        I can't be sure of their actual ages.

        I've been copying media across from slug1 to slug2 and running md5sum
        on the files, just cos I'm paranoid that way. And I've noticed that
        most of the large files are reporting different checksums on the
        source file and the copy! Small stuff (e.g. MP3 files) seems to be OK,
        but larger stuff (100MB+ video files) are getting corrupted about 80%
        of the time.

        I did my initial copies via drag-and-drop on a PC (so Samba to the PC
        & immediately Samba out again). Lots of md5 differences.

        Second attempt was sharing out the slug2 drive with nfs, mounting it
        R/W on slug1 and copying onto it. Still lots of md5 differences. (I
        can't find the nfs client on SlugOS, but that's for a different post...)

        Third attempt was plugging a USB hub into slug2 so that it could host
        both the drive from slug1 and the slug2 drive, then copy straight
        across. STILL lots of md5 differences.

        In all cases, I'm running md5sum at the command prompt, logged into
        the machine that the drive is physically attached to. The md5 for the
        source and destination file is frequently different but it's always
        consistent - if I run md5sum again I get the same result. However if I
        copy the file again I get a different md5 - after a couple of attempts
        I can usually get a copy which md5s the same as the source file.

        To approach the problem from a different angle, I pointed my PC's
        BitTorrent client at the Samba shares from the two slugs and
        downloaded some 300MB torrents onto the slugs. After the torrent
        completes, I select "Force Recheck" to check that the CRC for the
        downloaded data matches the checksums in the torrent file. On slug1
        (the uNSLUng system) this verifies 100%. On slug2 (the SlugOS/BE
        system) I'll always get about half a dozen blocks failing the CRC
        check - and each time I recheck, they're different blocks, even though
        the data on the slug's drive hasn't changed.

        To me, this looks like data corruption on both write and read. The
        fact that a directly connected drive also shows the errors seems to
        show it isn't the fault of Samba or nfs. The fact that the md5 results
        are consistent for a given file seems to show it isn't a drive problem.

        Has anybody else seen anything like this?
        Are there some SlugOS settings I should be tweaking?
        Or log files I should be looking in?
        Or something obvious I've done wrong?

        Apologies for the long post, but better to post all the info up front
        than to have it slowly extracted from me :-)

        Many thanks

        Jon




        Lesen Sie Ihre E-Mails jetzt einfach von unterwegs mit Yahoo! Go.
      • Mike (mwester)
        I too would be very interested in any more information that can be found about these situations. I ve been quite unable to duplicate Thomas issue with SlugOS
        Message 3 of 7 , Mar 4, 2008
        • 0 Attachment
          I too would be very interested in any more information that can be found
          about these situations. I've been quite unable to duplicate Thomas'
          issue with SlugOS 4.9 to date, so there must be something else that is
          triggering this problem.
          Mike (mwester)

          Thomas Reitmayr wrote:
          > Hi Jon,
          > I saw something very similar with SlugOS/BE 4.9 and SlugOS/LE(EABI)
          > 4.9 which right now makes my slug pretty much unusable as a file
          > server. You can find my descriptions of the symptoms at
          > http://tech.groups.yahoo.com/group/nslu2-linux/message/20857
          > for the LE system, and for the BE system at
          > http://tech.groups.yahoo.com/group/nslu2-linux/message/21083
          >
          > Please also check your syslog (command "dmesg" on your slug) to see if
          > you get the same errors. As described in the emails I suspect that
          > there is a problem in the ethernet driver. I found similar reports
          > from a few months ago and I am not sure whether the issue was
          > addressed back then or it might be still in the code. From looking at
          > the ethernet driver I have no big hope to find the problem myself...
          > [The issue could be buried somewhere else too, of course.]
          > Regards,
          > -Thomas
          >
          >
          > ----- Ursprüngliche Mail ----
          > Von: Jon <jon_canada100@...>
          > An: nslu2-linux@yahoogroups.com
          > Gesendet: Dienstag, den 4. März 2008, 20:08:53 Uhr
          > Betreff: [nslu2-linux] SlugOS/BE 4.8-beta large file corruption
          >
          > Hi folks
          > I have two slugs: slug1 has been running uNSLUng for about 9 months
          > with 2 300GB drives, Samba-ed out for backup and media storage across
          > my home network. Slug2 is relatively new, running SlugOS/BE 4.8-beta
          > off a 2GB USB stick with a 500GB drive for storage. All drives are
          > ext3 and both slugs are de-underclocked. Both were bought 2nd hand so
          > I can't be sure of their actual ages.
          >
          > I've been copying media across from slug1 to slug2 and running md5sum
          > on the files, just cos I'm paranoid that way. And I've noticed that
          > most of the large files are reporting different checksums on the
          > source file and the copy! Small stuff (e.g. MP3 files) seems to be OK,
          > but larger stuff (100MB+ video files) are getting corrupted about 80%
          > of the time.
          >
          > I did my initial copies via drag-and-drop on a PC (so Samba to the PC
          > & immediately Samba out again). Lots of md5 differences.
          >
          > Second attempt was sharing out the slug2 drive with nfs, mounting it
          > R/W on slug1 and copying onto it. Still lots of md5 differences. (I
          > can't find the nfs client on SlugOS, but that's for a different post...)
          >
          > Third attempt was plugging a USB hub into slug2 so that it could host
          > both the drive from slug1 and the slug2 drive, then copy straight
          > across. STILL lots of md5 differences.
          >
          > In all cases, I'm running md5sum at the command prompt, logged into
          > the machine that the drive is physically attached to. The md5 for the
          > source and destination file is frequently different but it's always
          > consistent - if I run md5sum again I get the same result. However if I
          > copy the file again I get a different md5 - after a couple of attempts
          > I can usually get a copy which md5s the same as the source file.
          >
          > To approach the problem from a different angle, I pointed my PC's
          > BitTorrent client at the Samba shares from the two slugs and
          > downloaded some 300MB torrents onto the slugs. After the torrent
          > completes, I select "Force Recheck" to check that the CRC for the
          > downloaded data matches the checksums in the torrent file. On slug1
          > (the uNSLUng system) this verifies 100%. On slug2 (the SlugOS/BE
          > system) I'll always get about half a dozen blocks failing the CRC
          > check - and each time I recheck, they're different blocks, even though
          > the data on the slug's drive hasn't changed.
          >
          > To me, this looks like data corruption on both write and read. The
          > fact that a directly connected drive also shows the errors seems to
          > show it isn't the fault of Samba or nfs. The fact that the md5 results
          > are consistent for a given file seems to show it isn't a drive problem.
          >
          > Has anybody else seen anything like this?
          > Are there some SlugOS settings I should be tweaking?
          > Or log files I should be looking in?
          > Or something obvious I've done wrong?
          >
          > Apologies for the long post, but better to post all the info up front
          > than to have it slowly extracted from me :-)
          >
          > Many thanks
          >
          > Jon
          >
          >
        • Jon
          ... found ... Thanks for the replies Mike & Thomas. I remember your conversations last month & decided they didn t apply to me because they were with 4.9 &
          Message 4 of 7 , Mar 4, 2008
          • 0 Attachment
            --- In nslu2-linux@yahoogroups.com, "Mike (mwester)" <mwester@...> wrote:
            >
            > I too would be very interested in any more information that can be
            found
            > about these situations. I've been quite unable to duplicate Thomas'
            > issue with SlugOS 4.9 to date, so there must be something else that is
            > triggering this problem.
            > Mike (mwester)

            Thanks for the replies Mike & Thomas. I remember your conversations
            last month & decided they didn't apply to me because they were with
            4.9 & 4.10 using different ethernet drivers. Maybe I'm wrong!

            Anyway, I'm not seeing anything new appearing in the output of dmesg.
            I also don't get any errors or failures reported to the terminal.

            The crazy thing is that one of my tests is done copying between two
            drives both directly connected to the same slug - which seems to count
            the network drivers out of it. And still I see md5 differences.

            Another point from testing I've done since my earlier post...
            I copied 1GB of MP3 files across (via the PC so
            slug1->Samba->PC->Samba->slug2) and ran md5 on the resulting 185 files
            and found 1 difference. So my theory that it only affected large files
            is incorrect - it's just more apparent in large files because there's
            more data to look at.

            If there's anything I can do to help you pin this down Mike, then just
            shout.

            Jon
          • Drew Gibson
            ... I m not experiencing your issues but might I suggest a couple of things to try? Have you tried copying with SSH? It will checksum the data as it goes. If
            Message 5 of 7 , Mar 5, 2008
            • 0 Attachment

              Jon wrote:
              --- In nslu2-linux@yahoogroups.com, "Mike (mwester)" <mwester@...> wrote:
                
              I too would be very interested in any more information that can be
                  
              found 
                
              about these situations.  I've been quite unable to duplicate Thomas' 
              issue with SlugOS 4.9 to date, so there must be something else that is 
              triggering this problem.
              Mike (mwester)
                  
              Thanks for the replies Mike & Thomas. I remember your conversations
              last month & decided they didn't apply to me because they were with
              4.9 & 4.10 using different ethernet drivers. Maybe I'm wrong!
              
              Anyway, I'm not seeing anything new appearing in the output of dmesg.
              I also don't get any errors or failures reported to the terminal.
              
              The crazy thing is that one of my tests is done copying between two
              drives both directly connected to the same slug - which seems to count
              the network drivers out of it. And still I see md5 differences.
              
              Another point from testing I've done since my earlier post...
              I copied 1GB of MP3 files across (via the PC so
              slug1->Samba->PC->Samba->slug2) and ran md5 on the resulting 185 files
              and found 1 difference. So my theory that it only affected large files
              is incorrect - it's just more apparent in large files because there's
              more data to look at.
              
              If there's anything I can do to help you pin this down Mike, then just
              shout.
              
              Jon
              
              
                

              I'm not experiencing your issues but might I suggest a couple of things to try?

              Have you tried copying with SSH? It will checksum the data as it goes. If the copy goes OK but the resultant files still have an MD5SUM mismatch, then you should look at the drives. If the copies take forever, then it may be doing many retries on the transfer, so look at the network, physical and drivers.

              The ethernet driver in SlugOS maybe suspect (as per Thomas), do you have a USB ethernet dongle you could try?

              regards,

              Drew


            • Thomas Reitmayr
              Hi, I noticed that the network traffic stopped from time to time for 0.5 to 2 seconds when the errors happened. For NFS write transfers (PC- Slug) the NFS
              Message 6 of 7 , Mar 5, 2008
              • 0 Attachment
                Hi,
                I noticed that the network traffic stopped from time to time for 0.5 to 2 seconds when the errors happened. For NFS write transfers (PC->Slug) the NFS syslog entries occurred during the transfer while the PC reported an error at the end of the file transfer.
                I will try SSH as you suggested and will report the results. Unfortunately I have no USB ethernet dongle at hand.
                Thanks,
                -Thomas

                ----- Ursprüngliche Mail ----
                Von: Drew Gibson <aggibson@...>
                An: nslu2-linux@yahoogroups.com
                Gesendet: Mittwoch, den 5. März 2008, 16:03:47 Uhr
                Betreff: Re: [nslu2-linux] Re: SlugOS/BE 4.8-beta large file corruption



                I'm not experiencing your issues but might I suggest a couple of things to try?

                Have you tried copying with SSH? It will checksum the data as it goes. If the copy goes OK but the resultant files still have an MD5SUM mismatch, then you should look at the drives. If the copies take forever, then it may be doing many retries on the transfer, so look at the network, physical and drivers.

                The ethernet driver in SlugOS maybe suspect (as per Thomas), do you have a USB ethernet dongle you could try?

                regards,

                Drew





                Beginnen Sie den Tag mit den neuesten Nachrichten. Machen Sie Yahoo! zu Ihrer Startseite!
              • Jon
                ... have ... Hi Drew, thanks for the ideas. Good idea about the USB ethernet dongle. Unfortunately I don t have a USB dongle around to test it with. Unless
                Message 7 of 7 , Mar 5, 2008
                • 0 Attachment
                  --- In nslu2-linux@yahoogroups.com, Drew Gibson <aggibson@...> wrote:
                  > I'm not experiencing your issues but might I suggest a couple of things
                  > to try?
                  > Have you tried copying with SSH? It will checksum the data as it goes.
                  > If the copy goes OK but the resultant files still have an MD5SUM
                  > mismatch, then you should look at the drives. If the copies take
                  > forever, then it may be doing many retries on the transfer, so look at
                  > the network, physical and drivers.
                  >
                  > The ethernet driver in SlugOS maybe suspect (as per Thomas), do you
                  have
                  > a USB ethernet dongle you could try?

                  Hi Drew,
                  thanks for the ideas.

                  Good idea about the USB ethernet dongle. Unfortunately I don't have a
                  USB dongle around to test it with. Unless SlugOS supports USB WiFi -
                  and from the problems I've had in the "conventional" Linux world with
                  my Linksys adapter, I don't think that's somewhere I want to go :-)

                  I ran more tests overnight... ran multiple md5sum passes on
                  directories containing several GB of files. And I got varying results.
                  This was with me logged into the SlugOS slug and I got differences
                  checksuming files on the 500GB drive that lives on that slug,
                  differences checksuming files on the 300GB drive that normally lives
                  on the uNSLUng slug and also differences checksuming files that I
                  copied onto a portion of the USB flash drive that contains SlugOS. So
                  it would appear that the physical drive is not the factor.

                  I tried a copy using scp. I copied the same 350MB file twice from the
                  uNSLUng slug, pushing it to the flash drive on the SlugOS slug. Then I
                  md5'ed each copy five times. The first copy gave the right checksum 2
                  out of 5 times, the second copy gave the right checksum 3 out of 5
                  times. A copy I did through a plain nfs copy gave variable answers on
                  each md5sum but the correct answer never appeared and there would
                  usually be a wrong answer that appeared more than once. My guess is
                  that the repeated answer is the actual MD5 of the copied file, so the
                  file copied with a straight copy is wrong, but the file copied with
                  scp is correct - it's just that md5sum is having a hard time telling it.

                  I did for a while wonder if the SlugOS build of md5sum might be
                  faulty! So I copied the same file over Samba to a Windows PC four
                  times & ran two different md5 programs on it there. Three of those
                  copies gave consistent results which matched the result I got most
                  frequently on the SlugOS slug itself, but one of the copies gave me a
                  different result.

                  So it appears to me that the actual file is being misread in some way.
                  Could the ext3 driver be causing the problems? The variable md5
                  results makes it sound like there's a read problem on the slug, before
                  the file even goes out onto the network. The incorrect md5 on a file
                  copied ONTO the SlugOS slug makes it sound like the error might also
                  affect writes. I'm tempted to try a different filesystem type - any
                  recommendation? No way of eliminating the USB driver though.

                  Thomas, could you find a couple of GB of 100+MB files on your SlugOS
                  slug and run a couple of md5sum passes over them, pipe the ouputs to
                  files & then diff them to look for changes? See if you're seeing the
                  same thing as me.

                  Thanks everyone

                  Jon
                Your message has been successfully submitted and would be delivered to recipients shortly.