Sam's place: August 2004

Tuesday, August 31, 2004

A little rant about Microsoft Internet Explorer's color parsing

Starting the other day I have been having a painful introduction to how Microsoft Internet Explorer (IE hereafter) parses colors in HTML documents.

A warning to all readers, things get strange and very geeky from here on in. Enjoy.

As my profile says, I am one of POPFile's developers. The POPFile team is commited to finding and making sure POPFile is capable of decoding new spammer "tricks". John Graham-Cumming, author and lead developer of POPFile also maintains the spammer's compendium, a catalogue of such spammer tricks.

The newest trick in the compendium is "Flex Hex", which was reported to John in July. POPFile's CVS code learned how to handle this trick shortly thereafter.

The essence of Flex Hex is that IE is very flexible in how it will interpret hexadecimal RGB values in any HTML attribute (I'm not sure about CSS) that expects color data. John sums it up well in the spammer's compendium:

Missing digits are treated as 0[...]. An incorrect digit is simply interpreted as 0. For example the values #F0F0F0, F0F0F0, F0F0F, #FxFxFx and FxFxFx are all the same.

Though the above generalization would have been good enough in 99% of cases, we found some cases where IE deviated from the fairly simple approach of zero-padding the field and zeroing invalid hex characters.

When color strings longer than 8 characters or shorter than 4 characters are used, things start to get strange. Things getting strange, particularly where undocumented, is always to a spammer's advantage. I will lay out here what IE does with unusual, unpredictable, or invalid color data.

If email filtering software isn't aware of how common HTML-enabled email readers will display HTML, malformed or otherwise, it becomes much easier for spammers to hide text within emails in a way that may fool statistical filters or otherwise evade filters.

As an interesting note, IE does this unusual parsing regardless of the doctype declaration, ignoring "standards mode". Mozilla performs similar parsing, differing only in how long strings are handled. However, in standards mode, invalid color notation is completely ignored by Mozilla and the default or parent color is allowed to set the color of the element.

The iframe below contains a slight variation on the DHTML page I used while determining how IE parses colors. I have gone out of my way to make it cross-platform, so other browsers can be tested with it. The two fields can be used to set the foreground and background colors of some text, and then the DOM of the page is sniffed to display the colors, as interpreted by the browser.

Throughout this explanation I will use a notation similar to CSS's RGB( RR, GG, BB) syntax to show how a value is split into red, green, and blue components. This isn't correct CSS RGB() syntax, but I am using it for clarity.

IE's non-CSS color parsing algorithm appears to behave as follows, in order to get to a 6 digit hexadecimal value from any string:

These steps may not be performed in the same order or using exactly the same criteria as IE, but the end result is identical as far as I can tell.

First, remove any hash-marks, then replace any non-hexadecimal characters (0-9a-f) with 0's.

Eg: #zqbttv becomes 00b000.

For lengths 1-2, right pad to 3 characters with 0's.

Eg: "0F" becomes "0F0", "F" becomes "F00".

For length 3, take each digit as a value for red, green, or blue, and prepend a 0 to that value.

Eg: "0F0" becomes RGB( 0, F, 0), which becomes RGB( 00, 0F, 00) or 000F00.

Any value shorter than 4 digits long is done at this point.

For lengths 4 and longer, the field is right-padded with 0's to the next full multiple of 3. This step is important for longer fields.

Eg: "0F0F" becomes "0F0F00", "0F0F0F0" becomes "0F0F0F000" and "00FF00FF00FF00FF" becomes "00FF00FF00FF00FF00"

Next, the string is broken into three even parts, representing red, green and blue, from left to right.

"0F0F00" behaves as expected, becoming RGB(0F, 0F, 00). Any string of 6 characters is done at this point.

Longer strings, such as "1234567890ABCDE" become RGB(12345, 67890, ABCDE). Extremely long strings are split similarly. "1234567890ABCDE1234567890ABCDE" becomes RGB( 1234567890, ABCDE12345, 67890ABCDE).

At this point, the RGB values are truncated individually.

If the individual RGB values are over 8 characters long, they are truncated to 8 characters by removing characters from the left. This, in particular, was unexpected.

RGB( 1234567890, ABCDE12345, 67890ABCDE) becomes RGB( 34567890, CDE12345, 890ABCDE), and so forth.

Once the individual RGB values are under 8 characters long they are truncated by removing characters from the right.

RGB( 34567890, CDE12345, 890ABCDE) becomes RGB( 34, CD, 89) or #34CD89, in more traditional notation.

Any string should be transformed into a 6-digit hexadeximal color by the above steps.

For instance, <font color="6db6ec49efd278cd0bc92d1e5e072d68"> (yes that is random hexadecimal data) will result in IE displaying text in the color "6ecde0", a rather pleasant light blue. This isn't at all what I would have expected before studying IE's behavior. A truncation to "6db6ec", I might have expected or to "072d68" (also a pale blues, coincidentally). However, if you look closely inside the random hexadecimal string, the components that make up the final RGB value are present, and in sequential order: "6db6ec49efd278cd0bc92d1e5e072d68"

To continue decoding this value, it first needs to be padded:
6db6ec49efd278cd0bc92d1e5e072d680

Then split into three even parts:
RGB( 6db6ec49efd, 278cd0bc92d, 1e5e072d680)

Then those parts are left-trimmed to 8 digits:
RGB( 6ec49efd, cd0bc92d, e072d680)

Then right-trimmed to the 2 most significant digits:
RGB( 6e, cd, e0)

And there you have it, the same color that IE will display if you enter 6db6ec49efd278cd0bc92d1e5e072d68 into one of the fields in the test applet above.

# posted by Sam @ 5:54 PM 7 comments | |

Monday, August 30, 2004

The SphereXP

The SphereXP

This looks like a very powerful and intuitive way to supplement your windows desktop. It places your monitor in the center of a virtual sphere and lets you move and store windows within that sphere. Windows can be moved further and closer to the user, or brought right to the front. I'm not sure how usable it really is, but the videos of it in use present a fairly different UI experience.

# posted by Sam @ 11:06 PM 0 comments | |

Sunday, August 29, 2004

An Illustrated Guide to Cryptographic Hashes

If you've been following the news about potential collisions in the MD5 and SHA-1 hashing algorithms, but don't understand exactly what a hash is, or why this is a big deal, this is an excellent explanation.

An Illustrated Guide to Cryptographic Hashes

# posted by Sam @ 11:16 AM 0 comments | |

GmailFS - Gmail Filesystem

This rates an 8.5 on my personal geek scale. Linux only at this point.

GmailFS - Gmail Filesystem

GmailFS provides a mountable Linux filesystem which uses your Gmail account as its storage medium. GmailFS is a Python application and uses the FUSE userland filesystem infrastructure to help provide the filesystem, and libgmail to communicate with Gmail.

# posted by Sam @ 10:24 AM 1 comments | |

Thursday, August 26, 2004

How to Chat Via Netcat

This geeky nugget showed up in a newsgroup I frequent. Credit goes to pchelp.

I'm not sure how useful it would be, given the prevalence of things like IM, SSH, and telnet, but printing to a remote user's terminal and allowing them to print back strikes a minimalist bone somewhere in my body.

I'm sure this is possible on *nix (nc originated there), but the poster has limited his instructions to windows. I think this will work on any platform if you substitute all mentions of DOS with console or terminal and give it a try.

-for Windows platforms

Requires Netcat (NC.EXE).

Parties at both ends must know one another's IP address.

Open two (2) command (DOS Prompt) windows. Place one above the other on
your desktop.

Select the upper window. Type this command:

nc [-u] -l -p port

where port = listening port number, known to your remote friend.

(Square brackets indicate an optional parameter. The "-u" switch is
optional. It causes Netcat to use the UDP protocol, which is sometimes
desirable for its relative "stealth," as when tunneling thru a
firewall.)

Now select the lower window. Type this command:

nc [-u] ipaddress port

where ipaddress = the remote IP address
port = the remote port number

(The -u switch is necessary if your chat partner is using UDP.)

The same commands may be entered in the Start...Run dialog, which will
cause the DOS windows to appear. They can also be created as shortcuts.

Assuming your chat partner has opened corresponding DOS windows and used
corresponding commands, you may now each type text in the lower window.
You will each see the other's transmissions in the upper window. Lines
of text are transmitted each time the key is struck.

When sending via UDP, I've found that the sender must hit Enter once
before display begins at the other end.

Ctrl-C will close the connection and exit Netcat. If you've run the
command from the Run dialog or a shortcut, the DOS window will close
immediately.

If you're using TCP and close your sending or receiving Netcat, the
corresponding Netcat at the remote end will close also. UDP doesn't
have this effect, being "connectionless."

If you're behind a NAT router, you must first set up tunneling,
sometimes termed a "virtual server"; directing the desired port/protocol
on the WAN to your own machine's IP/port on the LAN.

You can make a connection to your own machine by using your own IP
address. Handy for testing.

This also works between machines on the same LAN.

# posted by Sam @ 3:37 AM 4 comments | |

Wednesday, August 18, 2004

Study: Unpatched PCs compromised in 20 minutes | CNET News.com

Study: Unpatched PCs compromised in 20 minutes

If you are out there with a new computer, please don't ever connect it to the internet without being behind a NAT router or enabling your OS's firewall. Even if it is to just download a 'quick' patch.

# posted by Sam @ 11:14 PM 0 comments | |

Apparently I am Amiga OS

You are Amiga OS. Ahead of your time. You keep a lot of balls in the air. If only your parents had given you more opportunities to suceed.

Which OS are You?

# posted by Sam @ 12:50 AM 0 comments | |

The geek milkshake

This song starts:

My web log brings all the nerds to the yard,
and I'm like: "mine's better than yours".
Damn right, it's better than yours!
I can link you, but I have to charge!

And continues here

I think it's pretty funny.

# posted by Sam @ 12:27 AM 0 comments | |

Tuesday, August 17, 2004

Study: Spammers, Virus Writers Getting Chummy

A new MessageLabs report says more than 86% of the E-mail it sampled in June was spam--and nearly one in 10 contained a virus.
http://informationweek.com/story/showArticle.jhtml?articleID=29101653

# posted by Sam @ 9:45 PM 1 comments | |

Is this thing on?

Just making a little testy post.

# posted by Sam @ 4:07 AM 1 comments | |

Sam's place