Spam-Spam-Spam-Spam
Anatomy of a spam message:
I got this one in my box. It didn’t display properly since I’m not using a Microsoft mail reader. But it did slip past the junk mail filter. Most filters work on a point system. If a message accrues enough spam points, it’s flagged as junk.
It’s a good new, bad news thing. Bad news, spam still gets through my filter. Good news, there’s still much furtile ground for intelligent spam filtering.
Let’s take a line-by-line HUMAN analysis and see how many spam points we get. (Answer: 80 points.)
1: From: “Ewing Lakisha” <mbexf@china.com>
Normally, crap from overseas by senders not in my address book is spam. Add one spam point.
In this case Mr. Lakisha works for china.com. Wait a minute! Lakisha? China? +1 point.
China.com is a legitimate business. My guess is that their email system was hacked and/or the header was spoofed. I’m sure their IT guy is really happy about resultant spam rage.
2: Date: November 2, 2003 9:00:57 AM PST
3: To: kelly@redleopard.com
4: Subject: Re: VF, job would have
“Re: VF, job would have”?? OK, so the grammar ‘job would have’ matches the domain ‘china.com’ but such grammar from Mr. Lakisha? +1 point.
5: Reply-To: “Lakisha” <mbexf@china.com>
6: Return-Path: <mbexf@china.com>
7: Received: from TCLHZTEST ([61.235.105.220])
by typhoon.he.net for <kelly@redleopard.com>;
Sun, 2 Nov 2003 21:06:25 -0800
8: Received: from 52.6.164.222 by 61.235.105.220;
Sun, 02 Nov 2003 15:03:57 -0200
Usual header stuff, nothing really suspicious here.
9: Message-Id: <DVXLFXZOOAYCWZNSAKQQXC@canada.com>
Wait a minute. The root message came from canada.com? Sent to china.com? And I was copied on the reply? +1 point.
10: X-Mailer: The Bat! (v1.52f) Business
A lot of spam shows “X-Mailer: The Bat! (v1.52f) Business”. +1 point.
Spammers forge/fake some of the header info, but not always consistently. The Bat! is a legitimate MUA and so in and of itself does not translate to spam. However, many spammers use The Bat! so it gets the point. Let’s look further.
Take a look at line 14; The Bat! never uses this header. +2 points.
There is a spam company operating out of asia offering dedicated servers where they have tweaked the mail system to not show the originating IP address (i.e. act as blind relays). This message came out of china. +2 points.
11: Mime-Version: 1.0
12: Content-Type: multipart/alternative;
boundary=”–59060433960038391″
Any Content-Type that isn’t text/plain;charset=”iso-8859-1″ gets a point. +1 point.
The Content-Type declaration is poorly formed. +1 point.
13: X-Priority: 1
14: X-Msmail-Priority: High
All messages marked high priority are suspect. +1 point.
15:
Message format does not match Content-Type declaration. There is a missing boundary marker. Even if line 12 is poorly formed, there should be a corresponding poorly formed marker for the alternative representation. For example
–59060433960038391
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
to mark the beginning of the plain text part and
–59060433960038391
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
to mark the end of plain text and beginning of html and
–59060433960038391–
to mark the end of html.
Very suspicious. +3 points.
16: <HTML><HEAD><TITLE></TITLE>
The html doesn’t include a dtd against which to validate, for example
<!DOCTYPE -//w3c//dtd 4.0 html public transitional//en>
<html>
For good reason. The html is poorly formed and would fail the check. Almost all spam I’ve seen has really crappy html. The fact that there is no dtd reference gets +1 point.
The fact that html is not valid html gets +2 points.
The fact that the title tag is included but empty gets +1 points.
17: <META http-equiv=Content-Type
content=”text/html;
charset=windows-1251″>
The meta tag is OK but for the Windows character set. It’s a red flag. Most spammers use windows based tools. +1 point.
18: <META content=”MSHTML 6.00.2800.1141″
name=GENERATOR>
And here’s the culprit, MSHTML blablabla. +1 point.
19: <STYLE></STYLE>
And an empty style tag. +1 point.
20: </HEAD>
21: <BODY bgColor=#ffffff>
22: <font color=”white”>duel snuggle arise merchandise
madeleine hickman fascicle puberty hall pizzeria
intestine gland attenuate ferromagnet houston affront
augustus canaveral </font>
OK. Line 21 sets the background to white and line 22 sets the font color to white. This text is invisible when rendered. +5 points.
The text is complete jibberish. It could be a list but I’m giving it a point. +1 point.
23:
24: <p>Ban</gresham>ned C</adult>D Gov</aborning>ernment
d</enthalpy>on’t wan</elder>t m</hog>e t</bestirring>o
s</proximity>ell i</saloon>t. Se</apostate>e N</loblolly>ow +</p>
Here’s why there’s no w3c dtd … there are bogus tags in the html. Why? Why, to fool the dictionary test of probable spam words, of course. One point for every bogus tag and one point for every missing tag pair. 22 points.
Of course, the spammer could have written his own dtd and hosted it at same said ‘ehostszz.com’ below but if he were that clever, he wouldn’t be a spammer.
When html renders the paragraph, the text becomes clear:
Banned CD Government don’t want me to sell it. See Now +
Yeah. Right. 5 points.
25: <a href=”http://www.ehostzz.com/cd/”>
26: <img border=”0″ src=”http://www.ehostzz.com/cd/ads1.jpg”></a>
The link and image are highly suspect.
The link is to a known spam ad host. +10 points
The image is from a known spam image host +10 points
27: <br>
28: <font color=”white”>inescapable edelweiss girth crises
may hillmen deportation tow levee delivery leadsmen
adequacy blenheim </font>
Again. Line 21 sets the background to white and line 28 sets the font color to white. This text is invisible when rendered. +5 points.
The text is complete jibberish. +1 point.
29: </BODY>
30: </HTML>