Loading ...
Sorry, an error occurred while loading the content.

Try this Links extractor from web pages I made

Expand Messages
  • thisistrinath
    Hello friends, I made this URL extractor a while back and posted a message here but nobody replied. So I am posting again. It might be useful to you people. I
    Message 1 of 2 , Jan 23, 2009
    • 0 Attachment
      Hello friends,

      I made this URL extractor a while back and posted a message here
      but nobody replied. So I am posting again. It might be useful to you
      people. I have made a URL extractor(to get all the links in a web
      page) and put it at http://yerra.trinadh.googlepages.com/Linkfinder1.html

      Please test that, just by changing File_get_contents parameters
      at the top. Its features are: it looks for links in A & FRAME tags,
      rejects links with no or one character decription, rejects URL's with
      length greater than 200 chars, rejects URL's which are links to files
      like PDF, MPEG etc(26 such extensions) and finally I have also made it
      to reject image links because in my view most of image links are only
      advertisements and are thus useless.

      Use it and give me your comments and feedback, my next step is
      to convert relative URL's into absolute URL's and then to use streams
      to start looking for links as soon as I am getting the webpage instead
      of waiting for the whole download to finish, this will significantly
      reduce the time.

      PLease test it!
      Thanks,
      Trinadh Yerra
    • ankur
      wow, nice code. BUt what about if you write some comments on your code. Now its like full of a-z alphabets with for loops. :) Or else if you can change the
      Message 2 of 2 , Jan 24, 2009
      • 0 Attachment
        wow,

        nice code. BUt what about if you write some comments on your code.
        Now its like full of a-z alphabets with for loops. :)

        Or else if you can change the variable name with some specific names.

        That will be good, and code will be easily understandable.

        -=-=
        jai ho
        ~ ankur ~

        On Sat, Jan 24, 2009 at 11:35 AM, thisistrinath
        <thisistrinath@...>wrote:

        > Hello friends,
        >
        > I made this URL extractor a while back and posted a message here
        > but nobody replied. So I am posting again. It might be useful to you
        > people. I have made a URL extractor(to get all the links in a web
        > page) and put it at http://yerra.trinadh.googlepages.com/Linkfinder1.html
        >
        > Please test that, just by changing File_get_contents parameters
        > at the top. Its features are: it looks for links in A & FRAME tags,
        > rejects links with no or one character decription, rejects URL's with
        > length greater than 200 chars, rejects URL's which are links to files
        > like PDF, MPEG etc(26 such extensions) and finally I have also made it
        > to reject image links because in my view most of image links are only
        > advertisements and are thus useless.
        >
        > Use it and give me your comments and feedback, my next step is
        > to convert relative URL's into absolute URL's and then to use streams
        > to start looking for links as soon as I am getting the webpage instead
        > of waiting for the whole download to finish, this will significantly
        > reduce the time.
        >
        > PLease test it!
        > Thanks,
        > Trinadh Yerra
        >
        >
        >



        --

        Fred Allen - "Television is a medium because anything well done is rare."


        [Non-text portions of this message have been removed]
      Your message has been successfully submitted and would be delivered to recipients shortly.