Loading ...
Sorry, an error occurred while loading the content.

Re: [nslu2-general] Re: Life after NSLU2?

Expand Messages
  • Rob Lockhart
    ... Interesting... I use rsync between two USB HDDs on NSLU2, and use it with cygwin at work and at home. When specifying rsync -vca $SRC $DST it takes a
    Message 1 of 18 , Mar 31, 2008
    • 0 Attachment
      On Mon, Mar 31, 2008 at 10:09 AM, docbillnet <yahoo@...> wrote:

      > --- In nslu2-general@yahoogroups.com, "bloedmann999"
      > <Brian_Dorling@...> wrote:
      > >
      > > --- In nslu2-general@yahoogroups.com, "Paul Bonsor" <pb89552@> wrote:
      > > I posted my test results about this yesterday. But, I am running rsync
      > > between the two USB drives with --size-only specified, therefore there
      > > should be very little extra work for rsync to do, apart from copying
      > > the files. Still, my CPU is pegged at 100% and the xfer rate goes down
      > > to the approx. 3MByte/Sec seen.
      > >
      > > I wonder why?
      >
      > No matter what options you specify, rsync always computes checksums
      > for files that it transfers. So if you want to transfer files between
      > two USB drives with rysnc, you are best off to at least do the initial
      > sync with the drives plugged into a desktop machine, or have lots of
      > patients.
      >
      > From the rsync manual page:
      >
      > Note that rsync always verifies that each transferred file was
      > correctly reconstructed on the receiving side by checking its
      > whole-file checksum, but that automatic after-the-transfer
      > verification has nothing to do with this option's before-the-transfer
      > "Does this file need to be updated?" check.
      >

      Interesting... I use rsync between two USB HDDs on NSLU2, and use it with
      cygwin at work and at home. When specifying "rsync -vca $SRC $DST" it takes
      a whole lot longer to do the synchronization than if I specified "rsync -va
      $SRC $DST". Because of the transfer speeds, there's almost no way that
      without the "-c" switch, a CRC checksum is being generated.

      From what I gathered here:

      http://samba.org/rsync/how-rsync-works.html

      "The file list includes not only the pathnames but also ownership, mode,
      permissions, size and modtime. If the --checksum option has been specified
      it also includes the file checksums."

      Further on down, it states:
      "The file's checksum is generated as the temp-file is built. At the end of
      the file, this checksum is compared with the file checksum from the sender.
      If the file checksums do not match the temp-file is deleted."

      I'm wondering if that is correct, as surely CRC checksums can't be generated
      "quickly" (as compared to multi-GHz processors) with NSLU2. The exact
      command I use for mirroring from an NFS server is:

      nice -n 10 /usr/bin/rsync -va --delete --force --stats \
      --exclude=System\ Volume\ Information --exclude=lost+found \
      ${SRC}/ ${DST}/ >>/var/log/rsyncbackup.log

      with $SRC and $DST being of appropriately-named source and destination
      locations. If you specify "--progress" it'll also show you the transfer
      stats as they are happening. As was mentioned, however, if "-c" or
      "--checksum" parameter is given, files existing on both $SRC and $DST are
      MD4 checksum'ed versus diffs of size and mod times only (below from man
      page):

      "-c, --checksum skip based on checksum, not mod-time & size"

      In cygwin, I use "rsa" and "rsac" depending if I want to transfer files
      based on mod times or CRC check. This is especially useful as sometimes
      (for whatever reason) people open and close MS-Excel or MS-Word files (with
      no apparent modifications made), and the file date isn't modified but the
      CRC checks are different.

      $ alias rsa
      alias rsa='rsync -vrlpt --delete --force --progress --stats
      --exclude=System\ Volume\ Information --exclude=lost+found -e "ssh -p 23" '

      $ alias rsac
      alias rsac='rsync -vrlptc --delete --force --progress --stats
      --block-size=8192 --exclude=System\ Volume\ Information --exclude=lost+found
      -e "ssh -p 23" '

      As an experiment, I tried something below. I generated a 100MB file via
      /dev/urandom on each of a linux box and NSLU2. Then I forced the time to be
      the same on both via "touch". Then, I used parameters "-vca" and "-va" as
      NSLU2 as SRC and LINUXbox as DST. Note that I used the "-n" parameter,
      which means "dry run" (no file transfer, just go through the motions as if
      it did).

      =======================================
      [rob@Linux ~]$ dd if=/dev/urandom of=100MiB_file bs=100k count=1024
      1024+0 records in
      1024+0 records out
      104857600 bytes (105 MB) copied, 44.3161 seconds, 2.4 MB/s
      [rob@Linux ~]$ touch -t 200804010026.00 100MiB_file
      [rob@Linux ~]$ ls -la 100MiB_file
      -rw-r--r-- 1 rob rob 104857600 Apr 1 00:26 100MiB_file


      rob@NSLU2:~$ dd if=/dev/urandom of=100MiB_file bs=100k count=1024
      1024+0 records in
      1024+0 records out
      104857600 bytes (105 MB) copied, 264.031 seconds, 397 kB/s
      rob@NSLU2:~$ touch -t 200804010026.00 100MiB_file
      rob@NSLU2:~$ ls -la 100MiB_file
      -rw-r--r-- 1 rob rob 104857600 Apr 1 00:26 100MiB_file

      [rob@Linux ~]$ date; rsync -van rob@NSLU2:100MiB_file 100MiB_file ; date
      Tue Apr 1 00:33:05 EDT 2008
      receiving file list ... done
      sent 20 bytes received 82 bytes 40.80 bytes/sec
      total size is 104857600 speedup is 1028015.69
      Tue Apr 1 00:33:07 EDT 2008

      [rob@Linux ~]$ date; rsync -vacn rob@NSLU2:100MiB_file 100MiB_file ; date
      Tue Apr 1 00:33:49 EDT 2008
      receiving file list ... done
      100MiB_file
      sent 26 bytes received 104 bytes 7.43 bytes/sec
      total size is 104857600 speedup is 806596.92
      Tue Apr 1 00:34:06 EDT 2008

      [rob@Linux ~]$ date; rsync -vacn rob@NSLU2:100MiB_file 100MiB_file ; date
      Tue Apr 1 00:34:34 EDT 2008
      receiving file list ... done
      100MiB_file
      sent 26 bytes received 104 bytes 7.43 bytes/sec
      total size is 104857600 speedup is 806596.92
      Tue Apr 1 00:34:51 EDT 2008

      [rob@Linux ~]$ date; rsync -van rob@NSLU2:100MiB_file 100MiB_file ; date
      Tue Apr 1 00:35:38 EDT 2008
      receiving file list ... done
      sent 20 bytes received 82 bytes 29.14 bytes/sec
      total size is 104857600 speedup is 1028015.69
      Tue Apr 1 00:35:41 EDT 2008

      rob@NSLU2:~$ date; md5sum 100MiB_file ; date
      Tue Apr 1 00:46:33 EDT 2008
      62b2c27a678fd3bf07418c9f4f161c93 100MiB_file
      Tue Apr 1 00:46:42 EDT 2008

      [rob@Linux ~]$ date; md5sum 100MiB_file ; date
      Tue Apr 1 00:47:25 EDT 2008
      2ed67109979cec75d1b155581cdb7cf4 100MiB_file
      Tue Apr 1 00:47:26 EDT 2008
      =======================================

      Observations:
      1. Note that the bytes received and sent is different with the CRC option:
      CRC: sent = 6, recd = 22 bytes extra
      2. Note that with different file contents but identical
      UID/GID/moddate/size, the file doesn't transfer. The start/stop time is
      2-3s. This proves a checksum is not being performed, as the file contents
      are different.
      3. Note that with the "-c" checksum option, the file is transferred (as
      UID/GID/moddate/size are ignored per man page), but after a significantly
      longer time of 17s . This must predominantly be the time it takes to
      generate the checksum on both ends.
      4. Note from the MD5 checksums (rsync uses MD4), it is hard to believe that
      the NSLU2 can read a 100MiB file and generate an MD5 sum in less than 10
      seconds (>10MiBps); some caching must be occurring? I had heard throughput
      of ~3-5MiBps for the NSLU2.
      5. If I understand the process correctly, it appears that the MAN page is
      wrong in regards to checksum always being calculated for rsync. It won't be
      the first time a man page is wrong or clear as mud. :-)


      [Non-text portions of this message have been removed]
    • docbillnet
      ... with ... it takes ... rsync -va ... No. That is correct. With the -c option, transfers will be much slower still. The reason being with the -c option
      Message 2 of 18 , Apr 1, 2008
      • 0 Attachment
        --- In nslu2-general@yahoogroups.com, "Rob Lockhart" <rlockhar@...> wrote:
        >
        > On Mon, Mar 31, 2008 at 10:09 AM, docbillnet <yahoo@...> wrote:
        > > Note that rsync always verifies that each transferred file was
        > > correctly reconstructed on the receiving side by checking its
        > > whole-file checksum, but that automatic after-the-transfer
        > > verification has nothing to do with this option's before-the-transfer
        > > "Does this file need to be updated?" check.
        >
        > Interesting... I use rsync between two USB HDDs on NSLU2, and use it
        with
        > cygwin at work and at home. When specifying "rsync -vca $SRC $DST"
        it takes
        > a whole lot longer to do the synchronization than if I specified
        "rsync -va
        > $SRC $DST". Because of the transfer speeds, there's almost no way that
        > without the "-c" switch, a CRC checksum is being generated.

        No. That is correct. With the -c option, transfers will be much
        slower still. The reason being with the -c option the checksum is
        generated by reading each file prior to transfer. This effectively
        doubles your IO time, because each file will be read in twice.
        Without the -c, rsync computes the checksum while transferring. So
        the amount of CPU time is larger than say "cp", but there is no extra
        IO time. However, since the CPU on the NSLU2 is bottlenecked by CPU
        speed, any additional computations slow down the copy operation.
      • docbillnet
        ... page is ... won t be ... The man page says the checksums are always computed to compare the transfered files. Since you used -n, there was no transfer, so
        Message 3 of 18 , Apr 1, 2008
        • 0 Attachment
          --- In nslu2-general@yahoogroups.com, "Rob Lockhart" <rlockhar@...> wrote:
          >
          > On Mon, Mar 31, 2008 at 10:09 AM, docbillnet <yahoo@...> wrote:
          >
          > > --- In nslu2-general@yahoogroups.com, "bloedmann999"
          > > <Brian_Dorling@> wrote:
          > 5. If I understand the process correctly, it appears that the MAN
          page is
          > wrong in regards to checksum always being calculated for rsync. It
          won't be
          > the first time a man page is wrong or clear as mud. :-)

          The man page says the checksums are always computed to compare the
          transfered files. Since you used -n, there was no transfer, so there
          was no checksum. If it doesn't write the temp file, it has nothing to
          compute the checksum on.

          Bill
        • Rob Lockhart
          ... I just tried it again, and with touch -t 200804010026.00 100MiB_file on both sides, with *different CRCs*, there was no transfer, even without the -n
          Message 4 of 18 , Apr 1, 2008
          • 0 Attachment
            On Tue, Apr 1, 2008 at 1:42 PM, docbillnet <yahoo@...> wrote:

            > --- In nslu2-general@yahoogroups.com, "Rob Lockhart" <rlockhar@...> wrote:
            > >
            > > On Mon, Mar 31, 2008 at 10:09 AM, docbillnet <yahoo@...> wrote:
            > >
            > > > --- In nslu2-general@yahoogroups.com, "bloedmann999"
            > > > <Brian_Dorling@> wrote:
            > > 5. If I understand the process correctly, it appears that the MAN
            > page is
            > > wrong in regards to checksum always being calculated for rsync.
            >
            > The man page says the checksums are always computed to compare the
            > transfered files. Since you used -n, there was no transfer, so there
            > was no checksum. If it doesn't write the temp file, it has nothing to
            > compute the checksum on.
            >

            I just tried it again, and with "touch -t 200804010026.00 100MiB_file" on
            both sides, with *different CRCs*, there was no transfer, even without the
            "-n" switch. No temp file created, no checksum generated, as there was no
            transfer.

            rob@NSLU2:~$ md5sum -b 100MiB_file
            62b2c27a678fd3bf07418c9f4f161c93 *100MiB_file
            rob@NSLU2:~$ touch -t 200804010026.00 100MiB_file

            [rob@Linux ~]$ md5sum -b 100MiB_file
            2ed67109979cec75d1b155581cdb7cf4 *100MiB_file
            [rob@Linux ~]$ touch -t 200804010026.00 100MiB_file

            [rob@Linux ~]$ date; rsync -va --delete --progress --stats
            rob@NSLU2:100MiB_file
            100MiB_file ; date
            Tue Apr 1 22:05:01 EDT 2008
            receiving file list ...
            1 file to consider

            Number of files: 1
            Number of files transferred: 0
            Total file size: 104857600 bytes
            Total transferred file size: 0 bytes
            Literal data: 0 bytes
            Matched data: 0 bytes
            File list size: 62
            File list generation time: 0.012 seconds
            File list transfer time: 0.000 seconds
            Total bytes sent: 20
            Total bytes received: 82

            sent 20 bytes received 82 bytes 40.80 bytes/sec
            total size is 104857600 speedup is 1028015.69
            Tue Apr 1 22:05:03 EDT 2008

            [rob@Linux ~]$ md5sum -b 100MiB_file
            2ed67109979cec75d1b155581cdb7cf4 *100MiB_file

            rob@NSLU2:~$ md5sum -b 100MiB_file
            62b2c27a678fd3bf07418c9f4f161c93 *100MiB_file

            Assuming the checksum is generated while the file is being transferred, that
            doesn't seem to be the determining factor for whether to transfer files of
            identical UID/date/time/size (but different content). It appears that
            without the "-c" switch, the CRC is not calculated on files for which there
            is no pending transfer. I'm not sure what would be gained in calculating
            the CRC on a file during transfer, perhaps only to ensure that it matches
            the destination file CRC when being created? Indeed, the man page is
            correct regarding CRC of transferred files (based on criterion given to
            rsync with which to compare the files).

            -Rob

            P.S. Apologies for the diversion (thread hijacking), folks. Should've
            changed the subject to rsync or similar.


            [Non-text portions of this message have been removed]
          • docbillnet
            ... 100MiB_file on ... without the ... was no ... As predicted. Exactly what I said, the checksum is always computed for transfered files. I am glad you
            Message 5 of 18 , Apr 2, 2008
            • 0 Attachment
              --- In nslu2-general@yahoogroups.com, "Rob Lockhart" <rlockhar@...> wrote:
              >
              > On Tue, Apr 1, 2008 at 1:42 PM, docbillnet <yahoo@...> wrote:
              >
              > > --- In nslu2-general@yahoogroups.com, "Rob Lockhart" <rlockhar@>
              wrote:
              > > The man page says the checksums are always computed to compare the
              > > transfered files. Since you used -n, there was no transfer, so there
              > > was no checksum. If it doesn't write the temp file, it has nothing to
              > > compute the checksum on.
              > >
              >
              > I just tried it again, and with "touch -t 200804010026.00
              100MiB_file" on
              > both sides, with *different CRCs*, there was no transfer, even
              without the
              > "-n" switch. No temp file created, no checksum generated, as there
              was no
              > transfer.

              As predicted. Exactly what I said, the checksum is always computed
              for transfered files. I am glad you could validate that the manual
              page is correct.

              > is no pending transfer. I'm not sure what would be gained in
              calculating
              > the CRC on a file during transfer, perhaps only to ensure that it
              matches
              > the destination file CRC when being created? Indeed, the man page is

              I am assuming that the same code is used to compute checksums while
              reading regardless if the file transfer is local or remote. The same
              code is also probably used to compute checksums when saving,
              regardless of transfer mode. I can see absolutely no advantage in
              computing a checksum as your read a file and a checksum as you write a
              file locally. If the checksum is changing in your memory blocks, then
              no amount of sanity checking will help... I can see the advantage
              across a socket or a pipe. TCP-IP packets only use a very small
              checksum (I think it is one byte). Which means if you have enough
              errors occur you are almost guaranteed that one of them will be
              received as valid data. When you consider people will use rsync to
              copy many terabytes of data, that means they would regularly corrupt
              files with even the rarest of network errors if not for the additional
              checksums.

              Bill
            Your message has been successfully submitted and would be delivered to recipients shortly.