Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] help parse xxx.yyy.com

Expand Messages
  • Gordon Stewart
    ... Why dont you parse the URL ... everything before then.. $text =~ s/^http: / ///i; #chops off the http $text =~ s/ /.*//gi; # im not sure, but I think that
    Message 1 of 2 , Aug 31, 2003
    • 0 Attachment
      At 01:49 AM 8/31/03 +0000, you wrote:
      >What is the best parsing to get the base domain?
      >
      >I need the base domain part like yahoo.com, ebay.com, perl.org, etc.
      >
      >The string to work on can be of any level like:
      >
      >ebay.com - expecting ebay.com
      >www.yahoo.com - expecting yahoo.com
      >www.house.station.fire.com - expecting fire.com
      >
      >thanks

      Why dont you parse the URL

      :- chop off the http:// (if it exists)
      :- Find the next / sign, & chop everything after that..
      :- Find that 'last' dot of the URL , & find the previous dot to that & chop
      everything before then..

      $text =~ s/^http:\/\///i; #chops off the http
      $text =~ s/\/.*//gi; # im not sure, but I think that will chop off
      everything after the / sign (if it exists..)

      This will leave you

      subdomain.domain.com
      www.domain.com
      www.subdomain.domain.com etc

      $text =~ m/(\..*?)$/; # gather everything after the last dot
      $g=$1;
      $text=~ s/\.$g//i; #chops off the .com .org etc

      $text =~ m/(\..*?)/i;
      $f = $1;

      print "$f.$g\n"; # Resulting domain


      Ps - the above is untested (havnt needed to use the ? feature much...
      _ But i think it works

      G









      [Non-text portions of this message have been removed]
    Your message has been successfully submitted and would be delivered to recipients shortly.