Loading ...
Sorry, an error occurred while loading the content.

Rsync - can slug really work on a 40GB tree?

Expand Messages
  • lpcarmed
    Hello, I m planning a to have an incremental rsync based backup for my archive partition - it s about 40GB in 15K directories and 100K files. Is it feasible on
    Message 1 of 10 , Dec 30, 2008
    • 0 Attachment
      Hello,

      I'm planning a to have an incremental rsync based backup for my
      archive partition - it's about 40GB in 15K directories and 100K files.
      Is it feasible on the slug or do I need something like a freenas box
      with gigs of memory?

      Thanks,
      Dave
    • jl.050877@gmail.com
      ... Maybe. I tried that an rsync-based backup program (rsnapshot) on 80 to 100 GB of data. The nightly data transfers were fast enough. However, before
      Message 2 of 10 , Dec 30, 2008
      • 0 Attachment
        On Wed, Dec 31, 2008 at 12:12:38AM -0000, lpcarmed wrote:
        > Hello,
        >
        > I'm planning a to have an incremental rsync based backup for my
        > archive partition - it's about 40GB in 15K directories and 100K files.
        > Is it feasible on the slug or do I need something like a freenas box
        > with gigs of memory?

        Maybe.

        I tried that an rsync-based backup program (rsnapshot) on 80
        to 100 GB of data. The nightly data transfers were fast enough.
        However, before transferring data, most backup programs create a
        hard-linked copied of the previous backup. I found that this took
        many hours on a slug.

        Also, on a file system with 100+ GB of files, and many
        hard-links, e2fsck took many hours.

        I ended up upgrading to a Thecus N2100. Another possible
        choice now is a QNAP. Like the N2100, it runs Debian-arm.

        Since you have a smaller amount of data, a slug may work for
        you.

        John
      • lpcarmed
        ... I can confirm the memory does not seem to be the bottleneck. The dry-run switch (-n) shows how much memory the backup will require on the slug. In my 40G
        Message 3 of 10 , Dec 31, 2008
        • 0 Attachment
          --- In nslu2-linux@yahoogroups.com, jl.050877@... wrote:
          >
          > On Wed, Dec 31, 2008 at 12:12:38AM -0000, lpcarmed wrote:
          > > Hello,
          > >
          > > I'm planning a to have an incremental rsync based backup for my
          > > archive partition - it's about 40GB in 15K directories and 100K files.
          > > Is it feasible on the slug or do I need something like a freenas box
          > > with gigs of memory?
          >
          > Maybe.
          >
          > I tried that an rsync-based backup program (rsnapshot) on 80
          > to 100 GB of data. The nightly data transfers were fast enough.
          > However, before transferring data, most backup programs create a
          > hard-linked copied of the previous backup. I found that this took
          > many hours on a slug.
          >
          > Also, on a file system with 100+ GB of files, and many
          > hard-links, e2fsck took many hours.
          >
          > I ended up upgrading to a Thecus N2100. Another possible
          > choice now is a QNAP. Like the N2100, it runs Debian-arm.
          >
          > Since you have a smaller amount of data, a slug may work for
          > you.
          >
          > John
          >

          I can confirm the memory does not seem to be the bottleneck. The
          dry-run switch (-n) shows how much memory the backup will require on
          the slug. In my 40G example with 132,000 files the memory was just
          3.66M with the new 3.0.4 rsync on both ends (3.0.0 made good progress
          as I read):

          sent 3.66M bytes received 411.81K bytes 28.59K bytes/sec
          total size is 40.03G speedup is 9827.05 (DRY RUN)
        • Rudy Moore
          I ve had no problems rsyncing hundreds of gigabytes (450GB at last count). The only caveat is that it s slow. I just leave it running all night and have no
          Message 4 of 10 , Dec 31, 2008
          • 0 Attachment
            I've had no problems rsyncing hundreds of gigabytes (450GB at last count). The only caveat is that it's slow. I just leave it running all night and have no problems.
          • redholm
            Why are a few hours a problem for your rsnapshots? I use Rsync from 4 laptops (two macs) to a slug. About 40 GB on each laptop. No problem but the delta is
            Message 5 of 10 , Dec 31, 2008
            • 0 Attachment
              Why are a few hours a problem for your rsnapshots?

              I use Rsync from 4 laptops (two macs) to a slug. About 40 GB on each
              laptop. No problem but the delta is small except for a large PST
              file (do not ask not my machine).
              I also use rsnapshot from slug drive1 (laptop backups) to slug drive2
              about 160 GB. The laptops are only access during the
              rsync less than 20 min and the longer rsnapshot only affects the slug.

              /Hagar


              --- In nslu2-linux@yahoogroups.com, jl.050877@... wrote:
              >
              >
              > Maybe.
              >
              > I tried that an rsync-based backup program (rsnapshot) on 80
              > to 100 GB of data. The nightly data transfers were fast enough.
              > However, before transferring data, most backup programs create a
              > hard-linked copied of the previous backup. I found that this took
              > many hours on a slug.
              >
              > Also, on a file system with 100+ GB of files, and many
              > hard-links, e2fsck took many hours.
              >
              > I ended up upgrading to a Thecus N2100. Another possible
              > choice now is a QNAP. Like the N2100, it runs Debian-arm.
              >
              > Since you have a smaller amount of data, a slug may work for
              > you.
              >
              > John
              >
            • jl.050877@gmail.com
              ... Interesting. How long does e2fsck take? Also, are you using rsync directly? If not, which backup program? Does it use hard-linking? John
              Message 6 of 10 , Dec 31, 2008
              • 0 Attachment
                On Wed, Dec 31, 2008 at 02:34:04PM -0800, Rudy Moore wrote:
                > I've had no problems rsyncing hundreds of gigabytes (450GB at last count). The only caveat is that it's slow. I just leave it running all night and have no problems.

                Interesting. How long does e2fsck take?

                Also, are you using rsync directly? If not, which backup program?
                Does it use hard-linking?

                John
              • bloedmann999
                ... In my experience the size of the tree is not in the slightest of importance. More important is the number of files in that tree. The slug has to be able
                Message 7 of 10 , Jan 2, 2009
                • 0 Attachment
                  --- In nslu2-linux@yahoogroups.com, "redholm" <redholm@...> wrote:
                  >
                  > Why are a few hours a problem for your rsnapshots?
                  >
                  > I use Rsync from 4 laptops (two macs) to a slug. About 40 GB on each
                  > laptop. No problem but the delta is small except for a large PST
                  > file (do not ask not my machine).
                  > I also use rsnapshot from slug drive1 (laptop backups) to slug drive2
                  > about 160 GB. The laptops are only access during the
                  > rsync less than 20 min and the longer rsnapshot only affects the slug.
                  >
                  > /Hagar
                  >
                  >
                  > --- In nslu2-linux@yahoogroups.com, jl.050877@ wrote:
                  > >
                  > >
                  > > Maybe.
                  > >
                  > > I tried that an rsync-based backup program (rsnapshot) on 80
                  > > to 100 GB of data. The nightly data transfers were fast enough.
                  > > However, before transferring data, most backup programs create a
                  > > hard-linked copied of the previous backup. I found that this took
                  > > many hours on a slug.
                  > >
                  > > Also, on a file system with 100+ GB of files, and many
                  > > hard-links, e2fsck took many hours.
                  > >
                  > > I ended up upgrading to a Thecus N2100. Another possible
                  > > choice now is a QNAP. Like the N2100, it runs Debian-arm.
                  > >
                  > > Since you have a smaller amount of data, a slug may work for
                  > > you.
                  > >
                  > > John
                  > >
                  >
                  In my experience the size of the tree is not in the slightest of
                  importance. More important is the number of files in that tree. The
                  slug has to be able to create or keep the file list in memory, and if
                  that list is too large, then the SLUG goes into paging, and then it
                  starts to take a real long time to do anything. And there is no
                  solution to that, except more RAM.

                  Newer versions of rsync process chunks of this list (I have heard) so
                  its never required to keep the whole list in RAM. Apparently this
                  provides relief from the RAM problem.

                  Cheers Brian
                • Rudy Moore
                  ... I m not sure how long e2fsck takes... I haven t run it manually on the drive... but probably would plug the drive into a linux box if I was doing something
                  Message 8 of 10 , Jan 5, 2009
                  • 0 Attachment
                    >>On Wed, Dec 31, 2008 at 02:34:04PM -0800, Rudy Moore wrote:
                    >> I've had no problems rsyncing hundreds of gigabytes (450GB at last count). The only caveat is that it's slow. I just leave it running all night and have no problems.

                    >Interesting. How long does e2fsck take?

                    >Also, are you using rsync directly? If not, which backup program?
                    >Does it use hard-linking?

                    I'm not sure how long e2fsck takes... I haven't run it manually on the drive... but probably would plug the drive into a linux box if I was doing something like that, simply for speed.

                    I do use rsync directly. Sorry, don't have the parameters in front of me. I wrote a script that mounts the drive and does the rsync from within cygwin on my windows machines. So I rarely see them.

                    I'm not sure about hard-linking... can you explain a little more about the concept? I'm familiar with soft-links - use them all over the place to make it easier to navigate directories. I generally thought you only want one hard-link to a file, but the term has come up in other contexts recently, so I'm curious when you would use it...

                    Thanks,
                    Rudy
                  • jl.050877@gmail.com
                    ... You are right that hard links are rarely useful but, IMO, this is an exception. Many rsync-based backup programs will do a backup one day, e.g.: rsync -a
                    Message 9 of 10 , Jan 5, 2009
                    • 0 Attachment
                      On Mon, Jan 05, 2009 at 10:28:39AM -0800, Rudy Moore wrote:
                      > I'm not sure about hard-linking... can you explain a little more
                      >about the concept? I'm familiar with soft-links - use them all
                      >over the place to make it easier to navigate directories. I
                      >generally thought you only want one hard-link to a file, but the
                      >term has come up in other contexts recently, so I'm curious when
                      >you would use it...

                      You are right that hard links are rarely useful but, IMO, this is
                      an exception.

                      Many rsync-based backup programs will do a backup one day, e.g.:

                      rsync -a valuable:/data Monday_backup/

                      Before the next day's backup, they create a hard link:

                      cp -a --link Monday_backup Tuesday_backup
                      rsync -a valuable:/data Tuesday_backup/

                      If, say, an important file was deleted or corrupted on Tuesday, it
                      will be missing/bad in the Tuesday night backup. However, to find
                      a good copy, all you have to do is go back to the directory
                      Monday_backup.

                      I use rsnapshot which is of one of several scripts that automates
                      this process, offering, if you want, hourly, daily, weekly, monthly
                      rotations.

                      Like soft links, hard links conserve disk space. More importantly,
                      because hard links are used, each of the backup directories looks
                      exactly like the source directory: You can correctly restore a
                      system simply by copying from any one of them.

                      Regards,

                      John


                      >
                      > Thanks,
                      > Rudy
                      >

                      On Mon, Jan 05, 2009 at 10:28:39AM -0800, Rudy Moore wrote:
                      > >>On Wed, Dec 31, 2008 at 02:34:04PM -0800, Rudy Moore wrote:
                      > >> I've had no problems rsyncing hundreds of gigabytes (450GB at last count). The only caveat is that it's slow. I just leave it running all night and have no problems.
                      >
                      > >Interesting. How long does e2fsck take?
                      >
                      > >Also, are you using rsync directly? If not, which backup program?
                      > >Does it use hard-linking?
                      >
                      > I'm not sure how long e2fsck takes... I haven't run it manually on the drive... but probably would plug the drive into a linux box if I was doing something like that, simply for speed.
                      >
                      > I do use rsync directly. Sorry, don't have the parameters in front of me. I wrote a script that mounts the drive and does the rsync from within cygwin on my windows machines. So I rarely see them.
                      >
                      > I'm not sure about hard-linking... can you explain a little more about the concept? I'm familiar with soft-links - use them all over the place to make it easier to navigate directories. I generally thought you only want one hard-link to a file, but the term has come up in other contexts recently, so I'm curious when you would use it...
                      >
                      > Thanks,
                      > Rudy
                      >
                      >
                      >
                      >
                      >
                      > ------------------------------------
                      >
                      > Yahoo! Groups Links
                      >
                      >
                      >
                    • Stanley P. Miller
                      I liked bare rsync but found this to be pretty slick and am now using it. http://www.rsnapshot.org/ ... last count). The only caveat is that it s slow. I just
                      Message 10 of 10 , Jan 5, 2009
                      • 0 Attachment
                        I liked bare rsync but found this to be pretty slick and am now using it.

                        http://www.rsnapshot.org/



                        --- In nslu2-linux@yahoogroups.com, Rudy Moore <the_orn@...> wrote:
                        >
                        > >>On Wed, Dec 31, 2008 at 02:34:04PM -0800, Rudy Moore wrote:
                        > >> I've had no problems rsyncing hundreds of gigabytes (450GB at
                        last count). The only caveat is that it's slow. I just leave it
                        running all night and have no problems.
                        >
                        > >Interesting. How long does e2fsck take?
                        >
                        > >Also, are you using rsync directly? If not, which backup program?
                        > >Does it use hard-linking?
                        >
                        > I'm not sure how long e2fsck takes... I haven't run it manually on
                        the drive... but probably would plug the drive into a linux box if I
                        was doing something like that, simply for speed.
                        >
                        > I do use rsync directly. Sorry, don't have the parameters in front
                        of me. I wrote a script that mounts the drive and does the rsync from
                        within cygwin on my windows machines. So I rarely see them.
                        >
                        > I'm not sure about hard-linking... can you explain a little more
                        about the concept? I'm familiar with soft-links - use them all over
                        the place to make it easier to navigate directories. I generally
                        thought you only want one hard-link to a file, but the term has come
                        up in other contexts recently, so I'm curious when you would use it...
                        >
                        > Thanks,
                        > Rudy
                        >
                      Your message has been successfully submitted and would be delivered to recipients shortly.