Loading ...
Sorry, an error occurred while loading the content.
 

How to delete a mega-spew of files in a directory

Expand Messages
  • Thad Floryan
    The not-so-good IT service guy I mentioned from Usenet s ba.internet in Joan s thread about an HP Aloe motherboard has been trying to delete a presumed
    Message 1 of 6 , Jul 27, 2013
      The not-so-good IT service guy I mentioned from Usenet's ba.internet
      in Joan's thread about an HP "Aloe" motherboard has been trying to
      delete a presumed multi-millions of files that were apparently created
      by some malware his customer had downloaded [onto a Windows system]
      and he's already spending 2 days still deleting the files because the
      disk was full. Once you finish laughing, you can see the correct
      solution for Windows at the END of this article. :-)

      The correct solution for Linux is a wee bit more interesting.

      'rm -rf *' doesn't work well to clean up mega-spew. The problem is
      the shell dies or aborts attempting to expand the '*' into a list of
      arguments. A workaround is something along the lines of:

      ls -1 | xargs -E \n -I x rm "x"

      A problem arises when you want to blow away the files in a directory,
      but don't want to delete the directory. /tmp is a good example,
      because you need the /tmp directory, and then you're caught by a
      'rm -rf *' shell expansion if you try that.

      This method is subtle:

      find . | perl -ple unlink

      This avoids both shell expansion and calling a new process for every
      rm. It's up to you to be very sure you want to delete every file
      under the directory [ the "." in the above example ], noting you can
      specify any directory. The one drawback is, unlike 'rm -rf *', this
      won't remove empty directories, but I still like the Perl solution
      because I'm one of the folks in the Perl AUTHORS file:

      http://cpansearch.perl.org/src/NWCLARK/perl-5.8.9/AUTHORS :-)

      And for the folks still needing to use or support Windows at work,
      there's this:

      http://superuser.com/questions/19762/mass-deleting-files-in-windows

      with its final example testing 28.3GB comprising 1,159,211 files in
      146,918 folders with a substantial improvement, 3x, over any of the
      conventional deletion techniques.

      Thad
    • thad_floryan
      ... For those with bad eyesight, crap fonts, dingleberried monitors, or trying to read this on a stupid smart phone, look CLOSELY at the argument to ls
      Message 2 of 6 , Jul 27, 2013
        --- In linux@yahoogroups.com, Thad Floryan <thad@...> wrote:
        > [...]
        > ls -1 | xargs -E \n -I x rm "x"
        > [...]

        For those with bad eyesight, crap fonts, dingleberried monitors, or
        trying to read this on a stupid "smart" phone, look CLOSELY at the
        argument to ls above. That's a dash followed by the digit 1 (one)
        which forces output at one filename per line.

        I have a good font here (Courier) for fixed-width and all characters
        have serifs so distinguishing between "1" and "l" (lower-case "L") is
        very easy, but this is going to be more of a problem with possibility
        catastrophic repercussions as time goes on when more folks use teeny
        tiny itty-bitty tablets and/or dumb smartphones -- typos at the wrong
        time or an unauspicious location (e.g., control panel of a nuke power
        reactor, landing an airplane at SFO, texting while driving a train too
        fast in Spain, etc.) can be really bad news and fatal.

        Thad
      • ed
        ... It s entirely easy to do when scripts don t house keep. You set a task running five years ago, forget about it and come back when your backups are taking
        Message 3 of 6 , Jul 27, 2013
          On Sat, Jul 27, 2013 at 02:34:49AM -0700, Thad Floryan wrote:
          > The not-so-good IT service guy I mentioned from Usenet's ba.internet
          > in Joan's thread about an HP "Aloe" motherboard has been trying to
          > delete a presumed multi-millions of files that were apparently created
          > by some malware his customer had downloaded [onto a Windows system]
          > and he's already spending 2 days still deleting the files because the
          > disk was full. Once you finish laughing, you can see the correct
          > solution for Windows at the END of this article. :-)

          It's entirely easy to do when scripts don't house keep. You set a task
          running five years ago, forget about it and come back when your backups
          are taking too long to find that you've used 70% of your inodes in one
          directory.

          This is when you also realise that ls sorts (possibly slower on Solaris
          though, that's something for another thread though). To avoid sorting,
          -f is your friend.

          > This method is subtle:
          >
          > find . | perl -ple unlink
          >
          > This avoids both shell expansion and calling a new process for every
          > rm. It's up to you to be very sure you want to delete every file
          > under the directory [ the "." in the above example ], noting you can
          > specify any directory. The one drawback is, unlike 'rm -rf *', this
          > won't remove empty directories, but I still like the Perl solution
          > because I'm one of the folks in the Perl AUTHORS file:
          >
          > http://cpansearch.perl.org/src/NWCLARK/perl-5.8.9/AUTHORS :-)

          To remove directories too this is what I've found to be the simplest
          method:

          $ find . -depth -print0 | xargs --null rm

          The -depth is important here as it prints the directory name when
          leaving (nearer the closedir than the opendir) the list routine.

          The -print0 is important too as it prints items with \0 separators,
          equally important is the --null (or -0) argument to xargs as that tells
          it that the files on stdin are going to separated with a null character.

          This is reasonably portable, however default solaris installs don't have
          findutils, they have the sloaris counterparts.

          'find' is truly and excellent tool to get to know well as it answers
          many sysadmin questions 'where is file such and such?' if you can offer
          up other information such as partial directory names then -iregex is
          helpful here too:

          $ find . -iregex '.*flar.*'

          'find' also offers up logical operators:

          $ find . \( -iregex '.*abc.*' -a \( -iregex '.*def.*' -o -iregex
          '.*ghi.*' \) \)

          so we'll match anything that contains abc and ( def or ghi ). Helpful.
          You can do this too with grep, but that's because in this example grep
          is more suited for text matches, find has a wealth of other options
          available, -(a|m|c)time, -size, -printf etc etc.

          GNU find is definitely a good friend to keep near :)

          The perl option I had running in a script. Due to the unique way that
          some of the systems I look after have an ancient perl (5.004_04) install
          the unlink failed on files >2G! This was news to me as it does a stat
          before doing the rm, the shopping list has to be built a long time
          before the rm is done, so a -f file presence check is needed. That's
          where the stat was called and caused a failure. Luckily we had perl
          5.8.1 installed locally so that was a simple cure.

          --
          Best regards,
          Ed http://www.s5h.net/
        • thad_floryan
          ... Hi Ed, Some great tips; thank you for sharing them! Your mention of find being the solution to where is file such&such is something I ve been doing for
          Message 4 of 6 , Jul 27, 2013
            --- In linux@yahoogroups.com, ed <ed@...> wrote:
            > [...]
            > This is when you also realise that ls sorts (possibly slower on
            > Solaris though, that's something for another thread though). To
            > avoid sorting, -f is your friend.
            > [...]
            > To remove directories too this is what I've found to be the simplest
            > method:
            >
            > $ find . -depth -print0 | xargs --null rm
            >
            > The -depth is important here as it prints the directory name when
            > leaving (nearer the closedir than the opendir) the list routine.
            >
            > The -print0 is important too as it prints items with \0 separators,
            > equally important is the --null (or -0) argument to xargs as that
            > tells it that the files on stdin are going to separated with a null
            > character.
            >
            > This is reasonably portable, however default solaris installs don't
            > have findutils, they have the sloaris counterparts.
            >
            > 'find' is truly and excellent tool to get to know well as it answers
            > many sysadmin questions 'where is file such and such?' if you can
            > offer up other information such as partial directory names then
            > -iregex is helpful here too:
            >
            > $ find . -iregex '.*flar.*'
            >
            > 'find' also offers up logical operators:
            >
            > $ find . \( -iregex '.*abc.*' -a \( -iregex '.*def.*' -o -iregex
            > '.*ghi.*' \) \)
            >
            > so we'll match anything that contains abc and ( def or ghi ).
            > Helpful.
            > You can do this too with grep, but that's because in this example
            > grep is more suited for text matches, find has a wealth of other
            > options available, -(a|m|c)time, -size, -printf etc etc.
            >
            > GNU find is definitely a good friend to keep near :)
            > [...]

            Hi Ed,

            Some great tips; thank you for sharing them!

            Your mention of find being the solution to 'where is file such&such'
            is something I've been doing for decades across UNIX and Linux systems
            per this example (with the 'cut -c 12-' adjusted for different OSs and
            such (line folded at '\' due to line length) with the following two
            lines in a shell script:

            cd { treetop at which to begin }
            find `pwd` -type f -o -type l | cut -c 12- > \
            ~/thad_files_`date +%Y%m%d'_'%H%M`

            I generally would run the above once a day and then I can simply
            grep the 'thad_files_YYYYMMDD_HHMM' file to almost instantly find a
            file(s), for example:

            grep -i foo thad_* | grep -i bar

            sometimes suffixed with "| 'less'" or "| wc -l'" depending on what
            I'm doing.

            Thad
          • ed
            ... I do the same on a shared system where I don t have root so not possible to run locate, which is another damn fine too, but obviously not as featureful as
            Message 5 of 6 , Jul 28, 2013
              On Sat, Jul 27, 2013 at 10:30:44PM -0000, thad_floryan wrote:
              > [...]
              > cd { treetop at which to begin }
              > find `pwd` -type f -o -type l | cut -c 12- > \
              > ~/thad_files_`date +%Y%m%d'_'%H%M`
              >
              > I generally would run the above once a day and then I can simply
              > grep the 'thad_files_YYYYMMDD_HHMM' file to almost instantly find a
              > file(s), for example:

              I do the same on a shared system where I don't have root so not possible
              to run locate, which is another damn fine too, but obviously not as
              featureful as find. Benefits of locate are that it has key lookups where
              as a flat file has to be scanned. Disadvantages though that a keyed
              database requires more effort to build and occupies more space.

              Even if running a find to plain text file takes a day, and you do it
              everyweek, if you need it once a year, I think it's a time saver. Would
              you really want to wait a day for your results? I doubt it...

              An OS hook which updates a list would be useful, something like inotify,
              surely someone has done this already... Probably needs oodles of memory
              though. Perhaps it could have one index per dir, and be accessed via
              proc. I can imagine race conditions though.

              --
              Best regards,
              Ed http://www.s5h.net/
            • Allen D. Tate
              ... Now this little bit of advice will come in very handy at work. Muchos grassy ass, Thad! [Non-text portions of this message have been removed]
              Message 6 of 6 , Jul 28, 2013
                > On Sat, Jul 27, 2013 at 4:34 AM, Thad Floryan <thad@...> wrote:
                >
                > And for the folks still needing to use or support Windows at work,
                > there's this:
                >
                > http://superuser.com/questions/19762/mass-deleting-files-in-windows
                >
                > with its final example testing 28.3GB comprising 1,159,211 files in
                > 146,918 folders with a substantial improvement, 3x, over any of the
                > conventional deletion techniques.
                >
                > Thad

                Now this little bit of advice will come in very handy at work. Muchos
                grassy ass, Thad!


                [Non-text portions of this message have been removed]
              Your message has been successfully submitted and would be delivered to recipients shortly.