|
![]() |
| The battle againt junk e-mail (SPAM) |
|||
1. What is SPAM? 2. Why is junk e-mail called SPAM? 3. How do Spammers get my email address? 4. What is a SPAM Filter? Some filters are rules- based. A rules-based SPAM filter will analyze each message and rank characteristics that could typically classify the message as SPAM. If a message exceeds a given threshold, the SPAM filter classifies it as SPAM, and deletes it. Here’s an excerpt from an actual message advising me to quit smoking as reported by SAproxyPro: Content analysis details: (7.9 points, 5.0 required) pts rule name description |
|||
| 0.4 TO_ADDRESS_EQ_REAL 0.1 FREE_TRIAL 0.9 OFFERS_ETC 0.1 HTML_LINK_CLICK_HERE 1.5 HTML_IMAGE_ONLY_06 1.1 HTML_WEB_BUGS 0.2 HTML_MESSAGE 0.4 HTML_70_80 0.2 MIME_QP_LONG_LINE 2.8 RATWARE_STORM_URI 0.1 CLICK_BELOW |
To: repeats address as real name BODY: Free Trial BODY: Stop the offers, coupons, discounts BODY: HTML link text says "click here" BODY: HTML: images with 400-600 bytes of BODY: Image tag intended to identify you BODY: HTML included in message BODY: Message is 70% to 80% HTML RAW: Quoted-printable line longer than 76 URI: Bulk email fingerprint (StormPost) found Asks you to click below |
||
As you can see, this message received 7.9 points of an allowable 5.0 from the SPAM filter. The following message was able to get past my SPAM filter by limiting the number of rule violations within the message text and header. Content analysis details: (4.4 points, 5.0 required) pts rule name description |
|||
| 0.0 OPT_IN 0.7 EXCUSE_19 0.4 EXCUSE_1 0.3 EXCUSE_14 0.8 HTML_30_40 0.2 HTML_MESSAGE 0.4 HTML_TITLE_EMPTY 1.1 MIME_HTML_NO_CHARSET 0.4 NORMAL_HTTP_TO_IP 0.1 CLICK_BELOW |
BODY: Talks about opting in (lowercase) BODY: Claims you opted-in or registered BODY: Gives a lame excuse about why msg sent BODY: Tells you how to stop further spam BODY: Message is 30% to 40% HTML BODY: HTML included in message BODY: HTML title contains no text RAW: Message text in HTML without charset URI: Uses a dotted-decimal IP address in URL Asks you to click below |
||
This message is obviously SPAM, but it did not meet my minimum threshold requirement as being classified as such. I know what you’re thinking – “lower your threshold idiot”. There’s just one problem with lowering the threshold - almost all messages have rule violations. Simply creating a message with some HTML tags breaks a rule. A message which contains a link to a graphic image breaks another rule. Determining the optimum threshold value is what makes rules-based SPAM filters difficult to configure. A good SPAM filter will also check the message against a black list of known bad senders, and allow you to make your own white list of known good senders. Keep in mind, most professional Spammers will spoof their sender information as a legitimate sender. Another method of SPAM filtering is called “Bayesian” filtering named after Thomas Bayes, an 18th century cleric who created The Bayes Theorem. The Bayes Theorem uses statistical inference to estimate the probability that various hypotheses are true. In other words, a pattern of e-mail messages which are known to be good are compared against e-mail messages which are known to be bad to determine the validity of the current message. What makes the Bayesian filter so powerful is its ability to learn based on the e-mail messages that you’ve received. The only drawback is Bayesian filters require your input as to which messages are good and bad. Bayesian filters are becoming more and more popular. There are other types of SPAM filters, however most work similarly to the two examples I’ve given above. All filters, regardless of the theory used to create them, evaluate messages to determine the likelihood that the messages are junk. 5. How does junk e-mail get through my SPAM filter? Much like the scenario above, Spammers are clever. They know how SPAM filters work and they create messages that can sneak through your filter. They will rephrase words like Viagra as Vi@gra etc. They “spoof” the message header with known valid sender e-mail addresses and servers. They have many tools in their toolbox and they aren’t afraid to use them. |
|||
| :::.Copyright © 2007 .:::Imagine Systems, Inc. .::: |