RE: [cightml] Munged e-mail
Technically a spider downloads a page to a local machine and there it can
run any test it wants,
so in theory, it could send your page through a browser engine that
supports JS, and unicode extensions and if the browser could make the link
work, then it would grab your email, and no amount of encoding and/or
The simple fact is that for the mailto: to work, the address has to be a
link... and the spiders will just look for links and check what the browser
engine returns as the link address and parse that.. (like what you see in
the status bar when you mouseover a mailto link...)
You have to weigh up the benefit compared to the time spent.. if spiders
get complicated enough to get the JS and unicode encoding methods, there is
nothing you can do bar break the href link and make the user do something
to get the email address.
Otherwise, anything that works in the browser display is fair game to a
spider that has a browser engine running at its home base.
Since the full source code of about half a dozen suitable browser engines
are freely available on the net.. if that hasn't happened already it won't
(actually the source code is not even necessary, you can pass code to IE
dll's and it will return results as well.)
>From: Gordon Reeder [mailto:greeder@...]
>Sent: Monday, 1 September 2003 12:48 PM
>Subject: Re: [cightml] Munged e-mail
>Good point, re encoding the mailto:.
>However a spam crawler could also just look for unicode
>characters and assume it is a munged e-mail address. That
>is why I swapped parts 2 and 4. But a good spambot would
>look for the dot in the domain name and could re-assemble
>the address. So it's not foolproof yet.
>At the risk of geting too complicated, I'm thinking of encryping
>the strings as arrays of numbers. This would make the
>document.write procedure a bit more complicated since it would
>have to de-crypt the strings too.
>> Hi Gordon (and everyone)..
>> Quite clever to use both the common methods together to protect against
>> spiders that can do only one or the other..
>> There is a tool to generate the unicode at:
>> One thing I would also do, is encode the mailto: as well..
>> like so:
>> var addr1 = 'mailto:';
>> var addr2 = 'xprt.net';
>> var addr3 = '@';
>> var addr4 = 'greeder';
>> document.write( '<a href="' + addr1 + addr4 + addr3 + addr2 +'">');
>> document.write( addr4 + addr3 + addr2 + ' </a>');
>> The reason is pretty simple,
>> If I was to write a script to dig email address's out of a webpages, I'd
>> start by searching the HTML for the string "mailto:" and then
>> text between that and the following closing inverted comma.
>> (I'll bet thats how most spambots do it too.)
>> If you encode the "mailto:" as well, the script would fail to find the
>> email at all let alone decode it.
>> >-----Original Message-----
>> >From: Gordon Reeder [mailto:greeder@...]
>> >Sent: Monday, 1 September 2003 6:48 AM
>> >To: CIGHTML Mailing List
>> >Subject: [cightml] Munged e-mail
>> >There are a few tricks to munging an e-mail address. But
>> >the spam crawlers are getting smarter so I decided to change
>> >the way I munge my e-mail address. I thought you would find
>> >this usefull.
>> >The following script will write my e-mail address onto my web page.
>> >It is a modification of Paul's method shown in CIGHTML. There
>> >are three things that this code does.
>> >1) It uses unicode for the text strings instead of regular text.
>> >2)It scrambles the order of the substrings. Note that addr4 is
>> >writen before addr3.
>> >page --]
>> >var addr1 = "mailto:"
>> >var addr2 = "xprt.net"
>> >var addr3 = "@"
>> >var addr4 = "greeder"
>> >document.write( '<A HREF="' + addr1 + addr4 + addr3 + addr2 +'"
>> >document.write( addr4 + addr3 + addr2 + ' </A>')
>> >Gordon & Juanita Reeder
>> >greeder@... (Gordon)
>> >jmr2@... (Juanita)
>> >greeder@... (Both)
>> To unsubscribe from the Yahoo Group, send an email to:
>> This list is for readers of Paul McFedries' CIG to Creating a Web Page.
>> Messages prior to 2002 are also archived at http://www.mcfedries.com
>> Your use of Yahoo! Groups is subject to
Gordon & Juanita Reeder
Check out my web site:
To unsubscribe from the Yahoo Group, send an email to:
This list is for readers of Paul McFedries' CIG to Creating a Web Page.
Messages prior to 2002 are also archived at http://www.mcfedries.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/