Loading ...
Sorry, an error occurred while loading the content.
 

Re: [PBML] free source search engine ## comments?

Expand Messages
  • Robin
    I don t have access to these files....thank you anyway... -Robin ... From: Emanuel G Calso To: Sent:
    Message 1 of 6 , Apr 30, 2004
      I don't have access to these files....thank you anyway...
      -Robin

      ----- Original Message -----
      From: "Emanuel G Calso" <eman@...>
      To: <perl-beginner@yahoogroups.com>
      Sent: Friday, April 30, 2004 8:32 PM
      Subject: Re: [PBML] free source search engine ## comments?


      > On Saturday 2004 May 01 07:37, Robin wrote:
      > > > I'm curious why you feel the need to write everything from scratch,
      > >
      > > simply because it's fun, and I enjoy it. :-)
      > > -Robin
      > Da! It is fun, i just wish i could have time to do it again...
      > Still haven't had time to do a line-by-line or even a function-by-function
      > read, but just an advice, it's better to put long comments at the
      start/end
      > of each function, rather than put it in every line that you think will be
      > hard to understand... just my opinion, of course - i just find it easier
      to
      > read sources like that.
      > Also, don't you think it's better to have some "standards" in your
      variable
      > names? Me, i capitalize the first letter of each global (not env, but not
      in
      > a sub) variable... believe me, it's easier that way.
      > HTH
      >
      > ps
      > BTW, regarding your other code (actually, it's not regarding your code)...
      > about making the cgi-bin directory globally writeable, i just found a flaw
      > that's serious enough to tell you: If anonymous ftp account is enabled,
      and
      > could access that directory, it's a big problem. So please explicitly put
      in
      > your documentation and the start of your code that anonymous ftp shouldn't
      > have access to cgi-bin.
      > There are other ways to make the machine vulnerable using your code, but
      it
      > requires editing some config files, so people that do that probably know
      what
      > their doing, and would do some preventive measures. - one way would be
      > editing httpd.conf, and allowing "ordinary" access to the cgi-bin
      directory.
      >
      > --
      > eman calso
      > http://www.bloodpet.tk/
      > Government spending? I don't know what it's all about. I don't know
      > any more about this thing than an economist does, and, God knows, he
      > doesn't know much.
      > -- Will Rogers
      >
      >
      >
      > Unsubscribing info is here:
      http://help.yahoo.com/help/us/groups/groups-32.html
      > Yahoo! Groups Links
      >
      >
      >
      >
      >
      >
      >
    • Charles K. Clarkson
      ... My assumptions: I assume the HERE terminator should have started in the first column of each line and not in the second as shown. (Perhaps it is just a
      Message 2 of 6 , May 9, 2004
        Robin <robin@...> wrote:
        :
        : Everyone, let me know what you think!


        My assumptions:

        I assume the HERE terminator should have started in the
        first column of each line and not in the second as shown.
        (Perhaps it is just a mistake from posting the code.)

        I assume from the file test (-f) that we are searching
        plain files only.

        On my localhost this test eliminated every file but
        one. Including all my html and text files.

        On my linux host it fails all files but the .htaccess
        file which I really don't want searched.

        Perhaps a mechanism to check file with only certain
        suffixes would be better.

        I assume you didn't test adequately on HTML and XML files
        and that you are unaware that s/<.*>//g wipes out
        everything from <HTML to /HTML> and <?xml to /HTML>. You
        probably wanted s/<.*?>//g or s/<[^>]*>//g.

        I assume you are unaware that the program stops whenever
        a directory can't be opened. In my test, the cgi-bin
        directory is forbidden. Searching the root directory will
        never return a successful search.

        I assume you did not do vigorous tests on your script.

        I assume you are unaware that HERE documents return
        scalars, not arrays.


        : Charles, I'm too lazy to use and modify your code for the
        : header and footer, but in the next version I will. Thanks
        : again.

        That's okay, You use them in every script you write, and
        I'm sure you'll go back and change those other scripts too. :)



        : #!/usr/bin/perl
        :
        : use strict;
        : use warnings;
        :
        : use Fcntl qw (:flock);
        :
        : use CGI qw(:all);

        Tsk! Tsk!


        : $CGI::POST_MAX=1024 * 100; # max 100K posts
        : $CGI::DISABLE_UPLOADS = 1; # no uploads
        :
        : $" = '';
        : $ENV{'PATH'} = '/bin:/usr/bin:/usr/local/bin';
        :
        : my @directories = ("./", "../"); # change this to the
        : # directories you want
        : # to have searched.
        : # Include the slash at
        : # the end of the
        : # directory.
        : my $action = url_param ('action');
        : my $rootfile = url (relative=>1);

        $rootfile is not used in this script.


        : my $headerfile = "searchheader.txt";
        : my $footerfile = "searchfooter.txt";
        : my $errorfile = "ERR.txt";
        : my @head = getheader ($headerfile);
        : my @foot = getfooter ($footerfile);
        : my $date = getdate ();

        A nicer way to write these would align the
        equal signs and not use trailing comments.

        # change @directories to the directories you
        # want to have searched. Include the slash at
        # the end of the directory.
        my @directories = ( '../charles/' );

        my $headerfile = 'searchheader.txt';
        my $footerfile = 'searchfooter.txt';
        my $errorfile = 'ERR.txt';

        my @head = getheader( $headerfile );
        my @foot = getfooter( $footerfile );
        my $date = getdate();


        : my @errors;
        : my @finaldirs;

        Why is @finaldirs file scoped (Global)? It is only used
        in the search() subroutine.


        : checkerrors();
        :
        : if ($action eq "search")
        : {
        : search ();
        : }
        :
        : else
        : {
        : newsearch ();
        : }

        There's no check to be certain $action is defined. The
        script has an unintialized warning because of this. I also
        don't see the merit of using the $action variable.

        Why does @directories have to be file scoped (global).
        Hold on to your seat, folks. I going to pass it!

        if ( url_param('action')
        and url_param('action') eq 'search' ) {

        search( @directories );

        } else {
        newsearch();
        }


        : sub search
        : {
        : print header;
        : print (@head);
        : #code for parsing results

        my @finaldirs;

        : foreach my $dir (@directories)

        Since we are now passing the directories:

        foreach my $dir ( @_ )


        : {
        : opendir (DIR, $dir);

        How do you know it opened?

        next unless opendir DIR, $dir;


        : my @files_from_dir = readdir (DIR);

        Congratulations! Yeah! Yeah!

        Is this your first variable with underscores?

        Next thing you know you'll be using 4 space indents!

        NOTE: For those who find this abusive. Please stop
        reading and search for your sense of humor.


        : closedir (DIR);
        : foreach my $filefromdir (@files_from_dir)
        : {
        : if (! -d $filefromdir)
        : {
        : push (@finaldirs, "$dir$filefromdir")
        : }

        push @finaldirs, "$dir$filefromdir" unless -d $filefromdir;


        : }

        Or a little shorter:

        foreach my $dir (@directories) {
        next unless opendir DIR, $dir;
        foreach my $file ( readdir DIR ) {
        push @finaldirs, "$dir$file" unless -d $file;
        }
        }


        : }
        : my $query = param ('query');
        : my @finalresults;
        : foreach my $file (@finaldirs)
        : {
        : open (FILE, $file) or push (@errors, "A file open error occured on
        $file: $!.");
        : flock (FILE, LOCK_SH) or push (@errors, "A file lock error occured on
        $file: $!.");
        : checkerrors ();

        This halted the script and returned an error when
        it got to files in my cgi-bin directory. I assume you
        never tested this on your root directory, though I
        can't understand why.


        : my @contents;
        : my $contents;
        : @contents = <FILE> if (-f $file);
        : close (FILE);
        : chomp (@contents);
        : $contents = join ('', @contents);
        : $contents =~ s/<.*>//g;
        : my $result;
        : if ($contents =~ m/$query/ and $query)
        : {
        : $result = "<a href=\"$file\">$file</a><br>";
        : push (@finalresults, $result);
        : }
        : }


        Let's say the query is for the word 'chin'.
        Here's my text file:

        Joe,

        Thanks for the letter on ING. I think
        I will take your advice. I will search
        ing first. Their solution is the best.
        Tell June I said hello.

        Thanks,

        BOB

        Here's the file after your algorithm:

        Joe,Thanks for the letter on ING. I thinkI will take your advice. I will
        searching first. Their solution is the best.Tell June I said
        hello.Thanks,BOB

        ^^^^

        This algorithm shows success for "chin", even
        though it is not in the original letter. This
        algorithm also fails if the query is for
        "search ing". Looks like you need a better search
        algorithm.


        I stopped here as much of the rest would be a
        rehash of my old posts.


        HTH,

        Charles K. Clarkson
        --
        Mobile Homes Specialist
        254 968-8328
      Your message has been successfully submitted and would be delivered to recipients shortly.