Loading ...
Sorry, an error occurred while loading the content.

Highly compressed disk as tape replacement

Expand Messages
  • mlugassy
    Hey All, Have any of you created a high-compression drive which you use/dedicate to backup/synchronize data from all your other drives (whether other drives on
    Message 1 of 10 , Jun 3, 2006
    • 0 Attachment
      Hey All,
      Have any of you created a high-compression drive which you
      use/dedicate to backup/synchronize data from all your other drives
      (whether other drives on the slug or other computers in the house)?

      I'm just thinking that with some of the compression rates we can get
      these days, it could make for a great replacement for a tape drive.

      The need basicly comes from the fact that now, the Slug supports
      multiple drives on the 1st slot. So in my case, I have a bunch of
      NTFS-formatted drives connected via USB-hub on port 1 and my
      native-formatted (ext3) drive on port 2. Therefore, the basic
      Disk-2-Disk backup on the web interace no longer does the trick since
      it's designed to just backup drive1(port1) to drive2(port2). Moreover,
      1 could be a lot more efficient by backing up 2 or 3 full drives to 1
      single compressed drive (depending on the type of data of course).

      I would like to be able to dedicate a drive, doesn't matter which,
      with super-compression (e.g. >400%) so that if any of my other drives
      die, I'll be able to access/restore the data from there (doesn't
      matter how long it takes to restore, since really, it's acting as a
      tape drive replacement..). I'm just thinking of the cost savings
      compared to buying a tape library..

      The problem with RAID on the slug is that to use RAID6, you need at
      least 5 drives for it to make sense (not sure what the performance
      ramifications might be since some people have expressed data
      corruption problems with putting that many drives on a single slug).
      For RAID1, you're wasting 100% of your space. So really, RAID 4 or 5
      is the way to go, but there doesn't seem to be any detailed/easy
      explanation anywhere as to how to enable something like RAID-5 on a
      slug. So that's why the next best thing might be something like rsync
      with massive-compression on the destination..

      Let me know if you have any ideas!..might be nice to officially
      document this procedure somewhere..
      Mike
    • Marcel Nijenhof
      ... I wouldn t use ntfs in such a setup. There are some problems with that driver which causes hangs on the slug. Also you aren t able to store posix ownership
      Message 2 of 10 , Jun 6, 2006
      • 0 Attachment
        On Sat, 2006-06-03 at 19:40 +0000, mlugassy wrote:

        > The need basicly comes from the fact that now, the Slug supports
        > multiple drives on the 1st slot. So in my case, I have a bunch of
        > NTFS-formatted drives connected via USB-hub on port 1 and my
        > native-formatted (ext3) drive on port 2.

        I wouldn't use ntfs in such a setup. There are some problems with
        that driver which causes hangs on the slug. Also you aren't able
        to store posix ownership and permisions on ntfs.

        > For RAID1, you're wasting 100% of your space. So really, RAID 4 or 5
        > is the way to go, but there doesn't seem to be any detailed/easy
        > explanation anywhere as to how to enable something like RAID-5 on a
        > slug.

        1) You don't waste 100% with raid1. You are using it to protect
        your data.
        2) If you want a reasonable ratio between raw space and available
        space you need at least 4 disks.
        If you gone spend your money on 4 disks you definitely want
        more performance back than the slug can over.
        In that case i would spent some more and buy a cheap pc.

        --
        Marceln
      • Rick DeNatale
        ... Since the OP is looking for a replacement for tapes, I m guessing that he s interested in backup. Raid, no matter which type, is NOT backup, it provides
        Message 3 of 10 , Jun 6, 2006
        • 0 Attachment
          On 6/6/06, Marcel Nijenhof <nslu2@...> wrote:

          On Sat, 2006-06-03 at 19:40 +0000, mlugassy wrote:

          > For RAID1, you're wasting 100% of your space. So really, RAID 4 or 5
          > is the way to go, but there doesn't seem to be any detailed/easy
          > explanation anywhere as to how to enable something like RAID-5 on a
          > slug.

          1) You don't waste 100% with raid1. You are using it to protect
          your data.
          2) If you want a reasonable ratio between raw space and available
          space you need at least 4 disks.
          If you gone spend your money on 4 disks you definitely want
          more performance back than the slug can over.
          In that case i would spent some more and buy a cheap pc.

           



          Since the OP is looking for a replacement for tapes, I'm guessing that he's interested in backup.

          Raid, no matter which type, is NOT backup, it provides improved reliability and availability in the face of hardware failures by keeping data in a redundant form. If a drive fails, then having it as part of a raid array allows for easier recovery, but...

          Raid does not protect against loss of data due to carelessness, malefeasance or other causes.  For a blunt example, if you do:

          sudo rm -rf /

          You're going to be just as sorry whether / is on a raid array or not. 
          --
          Rick DeNatale

        • Nicola Fankhauser
          hi mlugassy first, if you believe in super-compression, then I believe in resurrection. :) second, as you write, rsync is a powerful tool, you can even use it
          Message 4 of 10 , Jun 6, 2006
          • 0 Attachment
            hi mlugassy

            first, if you believe in super-compression, then I believe in
            resurrection. :)

            second, as you write, rsync is a powerful tool, you can even use it to
            do incremental backups.

            if you really care about your data, forget RAID and look for off-site
            backups - be it manually (every week you copy your data on a 2n disk and
            afterwardsd put the disk in a safe place elsewhere) or automatically
            (with cron, you run rsync over the internet to your rented shell-account
            / server in a data center).

            as for why RAID (exept RAID1) is even less secure than one single drive
            is that the chances your array experiences a disk failure are higher
            than the chances that a single drive fails.

            regards
            nicola
          • Frederic Wenzel
            ... I disagree. Why should it be more likely that one disk in a disk array fails than one single disk failing? Does the disk know it is in an array and
            Message 5 of 10 , Jun 6, 2006
            • 0 Attachment
              Nicola Fankhauser schrieb:
              > as for why RAID (exept RAID1) is even less secure than one single drive
              > is that the chances your array experiences a disk failure are higher
              > than the chances that a single drive fails.

              I disagree. Why should it be more likely that one disk in a disk array
              fails than one single disk failing? Does the disk "know" it is in an
              array and change its behavior? Probably not.

              Fred
            • Edward Bertsch, CISSP
              good luck unless your disk is full of text files that say something along the lines of all work and no play makes jack a dull boy you will not see so much
              Message 6 of 10 , Jun 6, 2006
              • 0 Attachment
                good luck

                unless your disk is full of text files that say something along the
                lines of "all work and no play makes jack a dull boy"

                you will not see so much compression. certainly not 400% if you are
                storing mp3 files, jpeg, avi movies, etc...



                --- In nslu2-linux@yahoogroups.com, "mlugassy" <mikelugassy@...> wrote:


                > I would like to be able to dedicate a drive, doesn't matter which,
                > with super-compression (e.g. >400%) so that if any of my other drives
              • Nicola Fankhauser
                ... statistics, man. have a look at this paper [1] - for example: drives rated at 1 000 000 hours MTBF each in a 112 drive compound has a combined MTBF rating
                Message 7 of 10 , Jun 6, 2006
                • 0 Attachment
                  Frederic Wenzel wrote:
                  > I disagree. Why should it be more likely that one disk in a disk array
                  > fails than one single disk failing? Does the disk "know" it is in an
                  > array and change its behavior? Probably not.

                  statistics, man. have a look at this paper [1] - for example: drives
                  rated at 1'000'000 hours MTBF each in a 112 drive compound has a
                  combined MTBF rating of only 8,929 hours!

                  regards
                  nicola

                  [1]:
                  https://ilcsupport.desy.de/cdsagenda/askArchive.php?base=agenda&categ=a0533&id=a0533s1t12/moreinfo
                • Rick DeNatale
                  ... Yep, statistics, lies, and damned lies. Interesting that you picked a paper which actually comes to a much different conclusion about raid. Some of the
                  Message 8 of 10 , Jun 7, 2006
                  • 0 Attachment
                    On 6/7/06, Nicola Fankhauser <nicola.fankhauser@...> wrote:
                    > Frederic Wenzel wrote:
                    > > I disagree. Why should it be more likely that one disk in a disk array
                    > > fails than one single disk failing? Does the disk "know" it is in an
                    > > array and change its behavior? Probably not.
                    >
                    > statistics, man. have a look at this paper [1] - for example: drives
                    > rated at 1'000'000 hours MTBF each in a 112 drive compound has a
                    > combined MTBF rating of only 8,929 hours!

                    Yep, statistics, lies, and damned lies.

                    Interesting that you picked a paper which actually comes to a much
                    different conclusion about raid. Some of the points that I get from
                    reading that paper are:

                    1) Although having n disks raises the probability that one will fail
                    in time t, putting them into one or more raid arrays increases the
                    MTBF of the array since the array itself can continue to function in
                    the face of that failure. Also that probability of one drive failing
                    out of n is the same whether those n disks are part of a raid array or
                    not.

                    2) In many cases, availability, or the probability that the system is
                    operational at any given time, is more important than MBTF of
                    components. Raid increases availability, because redundancy allows
                    continued operation in the face of a component failure, and
                    replacement of the failed component restores the raid array to full
                    protection. If the components can be hot-swapped then downtime can be
                    reduced to zero. If individual drives are used, the lack of
                    redundancy means that recovering from a drive failure requires
                    restoration from the last backup, and may not bring the overal system
                    back to a consistent state, since the recovered drive will be "behind"
                    the others in the system, which may or may not be important.

                    So RAID has a positive effect both on MTBF, which increases for the
                    whole array even though the probability that 1 drive out of n is
                    higher as n increases; and more importantly on availability.

                    Having said all that RAID ONLY provides protection against component
                    failures. It does nothing to protect against software or user
                    failures (aka oops why did I do THAT). Using raid without a backup
                    strategy which allows the system to go back in time before one of
                    those failures is a trap all too many fall into.

                    In short, RAID is valuable if you desire reliability and availability,
                    but RAID IS NOT BACKUP.
                    --
                    Rick DeNatale

                    IPMS/USA Region 12 Coordinator
                    http://ipmsr12.denhaven2.com/

                    Visit the Project Mercury Wiki Site
                    http://www.mercuryspacecraft.com/
                  • John Soward
                    Well, that s assuming that the RAID type you are using exercises the R-for-redundant in RAID. A simple RAID level 0 array, or stripe, will increase your
                    Message 9 of 10 , Jun 7, 2006
                    • 0 Attachment

                      Well, that's assuming that the RAID type you are using exercises the R-for-redundant  in RAID. A simple RAID level 0 array, or stripe, will increase your chance of data loss since if any drive in the array fails your data is lost. Otherwise, well, that was the whole point of RAID, preventing data loss since eventually every drive will fail.

                      On Jun 7, 2006, at 8:05 AM, Rick DeNatale wrote:

                      On 6/7/06, Nicola Fankhauser <nicola.fankhauser@variant.ch> wrote:
                      > Frederic Wenzel wrote:
                      > > I disagree. Why should it be more likely that one disk in a disk array
                      > > fails than one single disk failing? Does the disk "know" it is in an
                      > > array and change its behavior? Probably not.
                      >
                      > statistics, man. have a look at this paper [1] - for example: drives
                      > rated at 1'000'000 hours MTBF each in a 112 drive compound has a
                      > combined MTBF rating of only 8,929 hours!

                      Yep, statistics, lies, and damned lies.

                      Interesting that you picked a paper which actually comes to a much
                      different conclusion about raid. Some of the points that I get from
                      reading that paper are:

                      1) Although having n disks raises the probability that one will fail
                      in time t, putting them into one or more raid arrays increases the
                      MTBF of the array since the array itself can continue to function in
                      the face of that failure. Also that probability of one drive failing
                      out of n is the same whether those n disks are part of a raid array or
                      not.

                      2) In many cases, availability, or the probability that the system is
                      operational at any given time, is more important than MBTF of
                      components. Raid increases availability, because redundancy allows
                      continued operation in the face of a component failure, and
                      replacement of the failed component restores the raid array to full
                      protection. If the components can be hot-swapped then downtime can be
                      reduced to zero. If individual drives are used, the lack of
                      redundancy means that recovering from a drive failure requires
                      restoration from the last backup, and may not bring the overal system
                      back to a consistent state, since the recovered drive will be "behind"
                      the others in the system, which may or may not be important.

                      So RAID has a positive effect both on MTBF, which increases for the
                      whole array even though the probability that 1 drive out of n is
                      higher as n increases; and more importantly on availability.

                      Having said all that RAID ONLY provides protection against component
                      failures. It does nothing to protect against software or user
                      failures (aka oops why did I do THAT). Using raid without a backup
                      strategy which allows the system to go back in time before one of
                      those failures is a trap all too many fall into.

                      In short, RAID is valuable if you desire reliability and availability,
                      but RAID IS NOT BACKUP.
                      --
                      Rick DeNatale

                      IPMS/USA Region 12 Coordinator
                      http://ipmsr12.denhaven2.com/

                      Visit the Project Mercury Wiki Site
                      http://www.mercuryspacecraft.com/


                    • Nicola Fankhauser
                      ... do you mean that I mistakenly quoted a paper that showed the opposite of what I was trying to prove? :) ... so in short: RAID makes your data storage as a
                      Message 10 of 10 , Jun 7, 2006
                      • 0 Attachment
                        Rick DeNatale wrote:
                        > Yep, statistics, lies, and damned lies.

                        :)

                        > Interesting that you picked a paper which actually comes to a much
                        > different conclusion about raid.

                        do you mean that I mistakenly quoted a paper that showed the opposite of
                        what I was trying to prove? :)

                        my statement was:
                        > as for why RAID (exept RAID1) is even less secure than one single drive
                        > is that the chances your array experiences a disk failure are higher
                        > than the chances that a single drive fails.

                        so in short: RAID makes your data storage as a whole safer, but the
                        probability that any drive in your array fails is higher than the
                        porbability of a single drive failing. this makes a difference if you
                        make a big array of smaller drives instead of a smaller array of bigger
                        drives.

                        I have to admit that the discussion is now most probably (statistics
                        anyone? :) of no use to the original poster anymore. :)

                        regards
                        nicola
                      Your message has been successfully submitted and would be delivered to recipients shortly.