Re: [PBML] help parse xxx.yyy.com
- At 01:49 AM 8/31/03 +0000, you wrote:
>What is the best parsing to get the base domain?Why dont you parse the URL
>I need the base domain part like yahoo.com, ebay.com, perl.org, etc.
>The string to work on can be of any level like:
>ebay.com - expecting ebay.com
>www.yahoo.com - expecting yahoo.com
>www.house.station.fire.com - expecting fire.com
:- chop off the http:// (if it exists)
:- Find the next / sign, & chop everything after that..
:- Find that 'last' dot of the URL , & find the previous dot to that & chop
everything before then..
$text =~ s/^http:\/\///i; #chops off the http
$text =~ s/\/.*//gi; # im not sure, but I think that will chop off
everything after the / sign (if it exists..)
This will leave you
$text =~ m/(\..*?)$/; # gather everything after the last dot
$text=~ s/\.$g//i; #chops off the .com .org etc
$text =~ m/(\..*?)/i;
$f = $1;
print "$f.$g\n"; # Resulting domain
Ps - the above is untested (havnt needed to use the ? feature much...
_ But i think it works
[Non-text portions of this message have been removed]