Reliable Answers - News and Commentary

JavaScript Email Cloaking

Updated November 9, 2007
by: Shawn K. Hall

For webmasters, spam is far more of a problem than a simple nuisance. Since web-spiders actually exist with the sole purpose of collecting/harvesting email addresses from websites, webmasters have come up with some very interesting (and often quite useless) means of attempting to prevent their address from being scalped by those scum-sucking filth that denigrate the Internet. Since a webmasters purpose for having a website is generally to enable his customers to be able to contact them, removing the email address altogether is hardly an option.

The solutions offered up to now have been anything but useful:

Related information

The code on this page serves as a solution that actually works to reduce, if not completely eliminate, spam that is sourced from the email address(es) exposed on your website. This solution is not guaranteed or warranted in any way, and the only support I offer is to paid users of my internet hosting services or via my "DesignAdvice" discussion group. There are several reasons why this method is not warranted, such as the fact you are obtaining it for free, I have no means of knowing whether the email address(es) you are cloaking are being used/posted/relayed elsewhere (which opens them up to more extensive opportunities for exploitation), and the simple fact that any method used to cloak email addresses will inevitably be cracked. The beauty of this method is that it takes that simple fact into account and provides wrapping functions for both server and client that make enhancing or replacing the encoding/decoding algorithm a simple activity (assuming you know a little script).

This system, which, after setup, is relatively painless to maintain is more difficult to use if you're not using a server-side scripting language of any kind (that is so '90's), but even if you're not, it's still relatively painless.

Client-Side Code

I'll take you from theory to implementation and give you the scripts necessary to make it work with ASP and PHP.

A simple link looks like this, before conversion:

<a href="/contact/" title="Contact Me!">Contact Me!</a>

We're not going to get rid of the current href, since this way, even if they don't have javascript or an email client on their system, they can still contact you using a contact form (at "/contact/").

All we need to do is add onmouseover and onfocus handlers to change the actual address to the 'real' email address when they interact with it:

  <a href="/contact/" title="Contact Me!"
   onmouseover="javascript:this.href=mailMe('example%23com','me');"
   onfocus="javascript:this.href=mailMe('example%23com','me');"
   >Contact Me!</a>

(Note: For HTML compatibility you should use "onMouseOver" and "onFocus", for XHTML use "onmouseover" and "onfocus" - most browsers don't 'care' but some strict parsers do)

Now we'll break apart the script:

javascript:this.href=mailMe('example%23com','webmaster');

javascript: is required in order to tell the browser what language we're using (some browsers will fail to parse it correctly if we don't include this).

this.href= means that it will change the current anchor tag (<a />) href attribute (where it goes) to the result of the following function: mailMe().

mailMe() is just my own pet name for the function, I highly suggest you rename it for your own use to something even more obscure, if possible. The idea is that the more unique your own method is (following these simple guidelines) the less likely Joe Blow's Email Scalping Spider will be able to collect the email addresses from your site.

My mailMe() javascript function looks like this:

function mailMe(sDom, sUser){
  return("mail"+"to:"+sUser+"@"+sDom.replace(/%23/g,"."));
}

What this does is swap the input positions for user and domain and replace the 'munged' text with the correct text (using a Regular Expression replace operation).

You may find it tempting to add a target attribute to these links in order to better control the environment for new windows should the user not have scripting enabled. I recommend against this, as some browsers treat mailto links as a natural browsing action and it leaves an orphaned window or tab after the email client finally intercepts the requests and generates a compose message window.

Munging the text is a "good idea." Otherwise it just takes "the right RegEx pattern" to extract the actual addresses from your site into the emails they were originally. The email address in the script above has been 'urlencoded' to make it slightly less obvious, and also had all the .'s replaced with "#" (which is an invalid character within a domain, so likely to break/fail most regex-based spiders scalping the site, if not immediately, then when they actually attempt to send the email).

Now how do you make the script part? You can hand-code it (yuck!) using this pattern:

mailMe('example%23com','webmaster');

So to convert the code above to send to "admin@yahoo.co.uk" you would use:

mailMe('yahoo%23co%23uk','admin');

Great! You can do it by hand now.

Yuck! By hand!?!

Ok, let's make it easier on ourselves. Seriously - what would we do if we found out spiders were suddenly capable of parsing this method and we wanted to change the encoding mechanism globally across our entire site all at once? We'd have to use our own regex to fix it (which introduces untold numbers of cans of worms) or we just make two changes:

  1. We change the function reference in our include file to encode it differently.
  2. We change the function in our referenced javascript file to decode it differently.

Sweet.

Server-Side Code

Mind you, for server-side functions I use the same function name since the server-side code in no way interferes with the client side code. Maybe it's strange to some, but it relieves me from having to remember two function names. :)

So, for ASP:

<%
  Function mailMe(sAddress, sCaption, sTitle)
  '=mailMe("user@example.com","Display","Title")
    Dim sBuild, sSplit, sSplit2, sMailMe
    sSplit  = Split(sAddress, "@", 2)
    If InStr(1, sSplit(1), "?", 1) > 0 Then
      sSplit2 = Split(sSplit(1), "?", 2)
      sMailMe = "mailMe('" _
        & Server.URLEncode( sSplit2(0) ) & "?" _
        & Replace( sSplit2(1), "'", "\'") & "','" _
        & Server.URLEncode( Replace( sSplit(0), ".", "#") ) _
        & "')"
    Else
      sMailMe = "mailMe('" _
        & Server.URLEncode( sSplit(1) ) _
        & "','" _
        & Server.URLEncode( Replace( sSplit(0), ".", "#") ) _
        & "')"
    End If
    sBuild = "onmouseover=""javascript:this.href=" _ 
      & sMailMe & ";"" " _
      & "onfocus=""javascript:this.href=" _ 
      & sMailMe & ";"""
    sBuild = "<a href=""/contact/"" " _
      & sBuild _
      & " title=""" & sTitle & """>"
    If sCaption = "" Then
      sBuild = sBuild & sSplit(0)
    Else
      sBuild = sBuild & sCaption
    End If
    mailMe = sBuild & "</a>"
  End Function
%>

This is called using:

<% =MailMe("Webmaster@example.com","Contact Me!","Send me email") %>

...where "Webmaster@example.com" is the target email address, "Contact Me!" is the display text (DO NOT USE THE EMAIL ADDRESS HERE!!!) and "Send me email" is the title attribute (what is displayed on float over or read to voice interface prompts and stuff). That function results in the complete HTML anchor tag, resulting in (breaks added for readability):

<a href="/contact/" 
  onmouseover="javascript:this.href=mailMe('example%23com','Webmaster');"
  onfocus="javascript:this.href=mailMe('example%23com','Webmaster');"
  title="Send me email">Contact Me!</a>

The PHP code to do the exact same thing:

<?php
function mailMe($saddress,$scaption,$stitle){
//variables
  $eaddress= "";  $sdomain= "";  $aextra = "";

//begin parsing
  list($eaddress, $sdomain)= split('@', $saddress);
  list($sdomain, $aextra) = split('\?', $sdomain);
  $sdomain = ereg_replace('\.', '#', $sdomain);

//create the js address
  $smailme = "mailMe('".urlencode( $sdomain );
  if($aextra != "" ){
    $smailme .= "?" . $aextra;
  }
  $smailme .= "','" . urlencode( $eaddress ) . "')";

//build the js events
  $sbuild =" onmouseover=\"javascript:this.href=$smailme;\"";
  $sbuild.=" onfocus=\"javascript:this.href=$smailme;\"";

//return
  return "<a href=\"/contact/\"$sbuild title=\"$stitle\">$scaption</a>";
}
?>

Lastly, it should be noted that both of these server-side functions also provide the courtesy of encoding query values, such as a specific subject tag or what-not, to the address as well. You could readily use something like this and it would encode correctly:

<% =mailMe("Webmaster@example.com?subject=hey out
there!","Contact Me!","Send me email") %>

or

<?php echo mailMe("Webmaster@example.com?subject=hey out
there!","Contact Me!","Send me email") ?>

Well, now you can do it all on the server. Add a <script> tag on the client that references your decoding javascript file with the mailMe() function in it and you're golden.

Can it be broken by intelligent spiders? Of course. You sit Joe Cracker down with the code for an hour or so and he'll correct his RegEx-capable parsing spider to decode even this. But by using a server-side mechanism like this one, you can change the encoding/decoding algorithm month-to-month, week-to-week, or even hour-to-hour and effectively avoid 100% of even the "intelligent" spiders out there. I have, however, used this method on my site for years (changed ever-so-slightly over the time period) and have not once had a single spam message go to any of the addresses that use this mechanism (that are solely used for those aspects of the site, mind you).

It's not too late to change over. Even if you're already getting spam to your email addresses you can switch to this method (or similar) and avert the inevitable increase in spam that will occur when more and more spiders collect your addresses.

Why bother? Who cares? It's just email!

Why should you be concerned about email obfuscation and cloaking?

Because there are some really unethical people out there that literally do nothing all day long but collect email addresses using web-spiders. No joke. The method described here not only prevents your email address from being obvious simple text (which every email spider will get, regardless of where it is on the page), but it also performs levels of obfuscation and abstraction with event-driven decoding that make it useable for nearly every "real" browser in the world, and gracefully degrade for those wacko no-script people that refuse to allow simple javascripts to function. Using the scripts here you should have no reason for anyone to be able to say they couldn't contact you through your website, while also having very low likelihood of your email address ever being scalped from this code and used for spam.

I've seen dozens of 'solutions' which are, for want of a better term, garbage. From hex-encoding the string or char-encoding the URL or other javascript solutions that just split it using "' + '"... well, they're all garbage. I have demonstrated how they all fail using very simple regular expressions (which are a primary component of every basic operating system over the last 19 years). This method provides you with the potential to change your algorithms at any point, globally across your site, by changing only 2 files. Talk about fast updates. :)

While I have no fantasy that I'll be able to prevent every email harvester in the world, I'll do my best to prevent as many from getting my email address as possible. Email is a time-sink, and even if I use the best anti-spam products and services in the world, spam wastes time and bandwidth that could be best used doing, well, anything else. And don't forget - it's not just about preventing spam, it's about ensuring the ability to receive legitimate contacts. That's at least equally important.

Regards,

Shawn K. Hall
http://12PointDesign/
http://ReliableAnswers.com/


Take me to the top

[Examples of Flawed Methods]

Javascript Text Combining

One of the common email obfuscation methods is to use a document.write statement in your javascript with text similar to this:

document.write('webm' + "aster" + '@' + 'exam' + 'ple.com');

The method to defeat this is as simple as replacing spaces and the four character sequences " + ", and ' + ', ' + ", and " + ' with empty strings before sending it through the email harvesting script. The script above becomes this simple text:

document.write('mailto:webmaster@example.com');

A trained monkey could, no doubt, find the email address within that text.

Javascript Array Values

Another oft-touted method of email obfuscation is to use a document.write statement or other javascript call which combines the ASCII text values within an array:

var emailriddlerarray=[121,111,117,64,101,120,97,109,112,108,101,46,99,111,109];

The method to defeat this is similar to the above, replacing the ASCII values with their text representations. The code above becomes this simple text:

document.write('mailto:you@example.com');

Again, trained monkeys...

HTML Entity and URL-Encoded Values

Randomly or completely replacing characters within the email address with HTML entities or URL encoded values is also referred to as "masking". It might make direct reading of the text more difficult to a casual user or internet neophyte, but you would have to be incredibly naive to think that spammers go to the trouble of individually collecting email addresses from each page they visit. For spammers, it's not about quality, it's about volume. And they score very high marks in persisting with their deluge of spam to every address in the world they can collect through their spiders, compromised mail servers and scumbots.

So how effective is the following email obfuscation?

&#119;eb%4da&#115;%74er%40e&#088;a%4dp%4ce%2eco%4d

To the human eye, it's very very effective. It's hard to discern what it represents directly, but spiders and other computer analysis programs do not suffer from the same liabilities as our naked eye. They're designed to process thousands upon thousand of instructions per second. And this type of thing - a simple text replacement - is a prime application for their use.

For example, your browser obviously interprets the text correctly, otherwise the use of replacement operations like that wouldn't be so prevalent. If your browser directly interprets the replaced text, how sturdy is it against a spider?

It's not. In a single second on hardware that is years old, you could convert thousands of these to their actual valid counterparts. Obfuscation like this serves to complicate matters for legitimate users only. The actual programmatic decoding is childs play. Simple stuff for anyone technically inclined enough to create their own "scum" spider.

There are also existing services out there specifically designed to help "de-obfuscate" both email addresses and URLs using these methods. For example, the URL deobfuscator from DNS Stuff is capable of easily interpreting several common obfuscation techniques, and the Scumple below can do far more than that.

"NOSPAM" and Other Hurdles

The problem with "hurdles" like NOSPAM, AT, spaces, [dot] and image and span interruption insertion is roughly the same as using entities and other encoded values. Anything common enough to be used as a hurdle (like NOSPAM) is also likely to be included in the steps to sanitize text before sending it through the email harvesting filter.

Instead of wasting processing time reading all the text on a page through an email parser, it's far more likely that hurdles will be replaced first. NOSPAM with an empty string, AT with @, <img...> with an empty string. Other tags that have specific meanings are also likely to be stripped before processing. Some of these pre-processors actually help to reduce processing requirements during the harvesting, and was likely used long before these methods were ever intended to help cloak email addresses.

Image Replacement

While image replacement has been quite effective in preventing harvesting in many cases (as long as the email address doesn't actually appear anywhere else on the page - like a link), it is also equally effective in preventing legitimate visitors (and potential customers!) from sending you email. Image replacement is an unrealistic solution to email harvesting for that very reason.

But "just" preventing legitimate users from contacting you isn't the only reason to avoid the image replacement method. The last few years has produced very capable and low-cost (in some cases, free) OCR software. OCR is "Optical Character Recogition", and is the process by which a computer "reads" text within an image. Any image legible enough for users to read easily is clear enough for an OCR program to automatically parse the results as well. Gimpy and PWNtcha are examples of how computer automated image assessment succeeds far better for text analysis than many people might yet believe.

The Sample Scumple Decoder

The following form enables you to test your email obfuscation method against a rather simple javascript based decoder I wrote the good portion of over two years ago. I recently updated it to demonstrate reversal of masked values. That is the only change made to the sample parser here in two years. Since I'm a white-hat, just think of what the black-hats have at their disposal. This sample focuses primarily on hurdles and encoded values.

Known issues: It is very very eager with uppercase "AT", and if a resultant email potentially includes this character combination, it may be overlooked or accidentally discarded. This also means that filenames that include these two characters in succession are more likely to be falsely interpreted as an email address.

Finally, this is a sample only, this is not intended to be the end-all be-all demonstration of what the most crafty scumbag is capable of doing to get your email address. Oh, and for those poor souls out there who were convinced by their ISP that their email addresses are case sensitive, unfortunately, you're wrong: Internet email addresses are not case sensitive. Please read RFC 822 before arguing with me.


Results will appear here.

Revised input will appear here.

Oh, and if you use this "Scumple" against this very page, it will, of course, find the EXAMPLE email addresses used here for EXAMPLE purposes (client-side displayed SERVER-side code, the Scumple default demo, and poor obfuscation examples). That is by design - they're EXAMPLE addresses, folks. I tell ya, one more person incapable of grasping the obvious...mumble mumble.

[EduTalk] - learn and discuss Education, Homeschooling and Educational Resources

Take me to the top

Your Ad Here?

Contact our Marketing department for information about advertising on this domain.


Take me to the top

We invite you
to visit:

Professional Web Hosting and Design Services: 12 Point Design Local Homeschool provides the most up-to-date support group listings in a geographical and searchable index Budget Homeschool Kidjacked -- To seize control of a child, by use of force SaferPC dispels security misunderstandings and provides you with a solid understanding of viruses and computer security Reliable Answers - developer information, current news, human interest and legislative news Twain Harte Photo Gallery - Twain Harte, CA - The closest you can get to Heaven on Earth Cranial Laser & Neurolymphatic Release Techniques (CLNRT) - Experience dramatic pain reduction At Summit Chiropractic our mission is to improve your quality of life - We know that health is much more than just not feeling pain Visit UniveralPreschool.com to learn about your preschool options. Dave's Quick Search Deskbar
Reliable Answers.com/js/mailme.asp AddThis Social Bookmark Button
Google