Vol. 1, No. 36
HTML Validation
Entropy is an inescapable fact about the universe. Inevitably,
there comes a time when the old homestead gets a little messy.
Maybe you've been doing some remodeling, moving furniture around
and the like; maybe you just finished building the place and
there's still sawdust in the corners; maybe it's just acquired a
bit too much of that lived-in look. Any way you look at it, it's
time to spiff up your HTML.
The liberal nature of browsers is such that they tend to
overlook errors in your code, thank Heaven. To the best of their
ability, when faced with freaky HTML, the browser just grits its
(metaphorical) teeth and renders the page the way it thinks it's
supposed to look. Unclosed tags, illegal attributes: these are the
ugly realities that your browser sees every day, and tries to
shield you from. So it's likely that the pages you build have minor
errors in them too, rarely seen but present none the less. It's
good form to get rid of them--what if a potential employer or mate
viewed your source? And in addition, different browsers can gloss
over errors in different ways, producing unexpected results
("The page doesn't show up AT ALL in IE 5?!").
Yes, it's time to validate your code.
The hard way to do this is to just read every line, counting on
your fingers to see that every tag you open is closed somewhere
along the line, and keeping a copy of the HTML specification open
to refer back to whenever you're not 100% sure off the top of your
head what attributes the TABLE tag takes. This approach results in
remarkably clean and well-formatted code, a dazzling familiarity
with how your site works, and many hours in front of the computer
(or, if you prefer, in front of page after page of smudgy, closely
spaced source printouts). But there is an easier way.
The demigods of the Web have seen fit to provide us with automatic
HTML validators. Some of these are programs you download and run on
your home computer (or, as we say, "local development environment");
others offer a pleasant Web-based interface. There are a number of
different ones, each with different strengths and specialties.
The principle on which all of this is founded is this, briefly: HTML
is a subset of SGML, the Standard Generalized Markup Language. The
exact dimensions of the subset are determined by a DTD: Document Type
Definition, which is the HTML specification understood by the browser.
A given page of HTML is valid insofar as it conforms to the tenets
of the particular DTD it's supposed to be following. Got that? Don't
worry if you're confused: it's all downhill from here.
The grandmamma of all online HTML validators comes from the
same place that the idea of valid HTML itself does: the World
Wide Web Consortium, or W3C. The W3C is a body that develops
specifications for various Web protocols, to foster standardization
and interoperability. The W3C validator parses HTML code and checks
to see if it complies with the standard, as well as with the W3C's
recommendations for good HTML. This is the authoritative way to
check if your code is following the standard to the letter. The
W3C validator is located at
http://validator.w3.org
Another valuable online validator is Doctor HTML. Doctor HTML
focuses more on the style and structure of your pages. It can check
for spelling errors, broken links, and structural problems, and
provide an estimated download time for a page. It can even check
password-protected pages and directories. Doctor HTML is located at
http://www2.imagiware.com/RxHTML
CYAN is the Web interface to HTML Tidy (which is downloadable in
its own right as an offline tool--see below). It is a terrific
weapon in the war against messy and unstreamlined code. It can
create orderly indentation patterns in your HTML code, change all
of your tags to uppercase or lowercase, change FONT tags and the
like to nice neat style sheets, and even convert HTML documents to
XML. It also checks the code for HTML standards compliance, of course.
CYAN can be found at
http://www.chamisplace.com/asp/hk.asp
Weblint is another popular syntax checking tool. It is written in
Perl to be used offline, but a number of Web-based gateways are
available so you can use it without downloading it (it takes a bit
of skill to configure and use). A list of gateways is here:
http://www.weblint.org/gateways.html
Weblint checks for a vast variety of errors and problems. You can
run the default test or check for each individually. It has full
support for the proprietary HTML extensions added by Netscape and by
Microsoft, checks cross-browser compatibility, and is continually
being improved and updated.
The preceding are all Web-based tools. They tend to be convenient,
because the Web is generally close at hand when you are working on
your site; and they don't take up any space on your hard drive. Still,
it can be a good thing to have an offline validation application.
They can be more flexible and powerful, and considerably quicker if
you have a large site.
Many HTML editors have some kind of validation and/or syntax
checking built in. If the editor you use meets your needs in this
regard, read no further. But if you don't use an editor with these
features, or if you want something more, there are a number of
standalone tools you may like.
Weblint, as mentioned earlier, is most often used offline. It runs
under Perl, so you need a copy of Perl on your system. (Perl is
available from http://www.perl.com/pub/language/info/software.html )
You can pass all kinds of options to the script at the time you run it.
The offline version can be downloaded from:
http://www.weblint.org/ftp-sites.html
HTML-Tidy is also a fabulous tool when used offline. It can be
downloaded from:
http://www.w3.org/People/Raggett/tidy/
It is run from a command line. If you're running Windows, you will
probably appreciate HTML-Kit, which provides an attractive graphical
front end and many window-based features to HTML Tidy.
HTML-Kit is available for download from:
http://www.chami.com/html-kit/
There's no reason not to validate your code, and several dozen reasons
that you should. Do it today, and soon the emails will start flooding
in, complimenting you on the handsome formatting and standards-compliance
of your site.
HINTS, POINTERS, AND TIPS 'O THE TRADE:
If you just want to view a page of source (yours or someone else's),
but it's all stuck together and garbled and ugly, run it through
PrettyPrinter:
http://www.selfpromotion.com/prettyprint.t
This little Web application will load any page's source code and
output it neatly indented and formatted for easy reading. It even
numbers lines.
HTMLSquisher will compress your HTML code, removing line breaks,
blank spaces, and all other unnecessary characters (like changing <STRONG>
to <B>, for example. It can be found at:
http://www2.imagiware.com/toolchest/squish/
Always clean behind AND inside your ears.
RESOURCES:
W3C HTML 4.0 recommendations
Webmonkey's site optimization tutorial