Vol. 2, No. 2
Getting Ready for the Future of the Web
As you may have heard, the Web is glacially moving away from kludgy,
imperfect HTML and toward the standard of the future: XML, the
eXtensible Markup Language. A lot of people are very very very excited
about this, more or less justifiably. How exactly XML works is an
exhausting topic to discuss -- check out the links below -- but
basically you can create your own markup tags to describe the kind of
information in a document. So if you're assembling a list of rock bands
and their personnel, it might include lines like:
<band>Thin Lizzy</band>
<vocalist_first_name>Phil</vocalist_first_name>
<vocalist_last_name>Lynott</vocalist_last_name>
and so forth. This is great for...well, you can imagine why a system
like that might be useful.
But XML is still kind of a ways off. But now your palms are sweaty and
you're rarin' to go, right? Well, there is some fun we can have in the
meantime. It's called XHTML: eXtensible Hypertext Markup Language. Kind
of a cross between HTML and XML, it's HTML made XML-compliant, so an XML
reader can parse it. You can convert all your HTML pages to XHTML now,
so when the big day comes you'll be ready. They'll all continue to work
on HTML browsers, but they'll be compliant with the next generation too.
And really it's not a big job to move from HTML to XHTML: why not do it?
The general focus of the changes we will be making is to regularize
everything. HTML allows a fair amount of sloppiness -- XHTML does not.
For starters, every tag and attribute has to be all lowercase. HTML
doesn't care about case, but XML does, so this is a good habit to get
into. (Many WYSIWYG HTML editors can automatically convert all your
tags to lowercase for you. See, who says computers aren't useful?)
Once you've made all your tags lowercase, have a look at the attributes.
Yes, they need to be lowercased too. But also every argument has to be
enclosed in quotation marks. In HTML, something like this is just fine:
<table width=90% cols=3 rows=2>
In XHTML, it has to become
<table width="90%" cols="3" rows="2">
Also, there are situations where HTML allows you to have an attribute
with no argument, like this:
<table cols="3" rows="2" nowrap>
That won't fly in XHTML. Instead, you have to make up an argument:
<table cols="3" rows="2" nowrap="nowrap">
just to keep the parser happy.
Another biggish task when turning HTML into XHTML is the issue of
closing tags. By traditional reckoning, nothing's dorkier than the HTML
coder who ends all her paragraphs with a </p> tag. Everybody knows that
simply beginning the next paragraph suffices to indicate that the last
one's done. Well, as Yoda says, you must unlearn what you have learned.
In XHTML, every tag must be closed. For every <p> there has to be a
</p>, for every <meta> a </meta>, and so forth. Grit your teeth and tell
yourself it'll all pay off some day.
Actually, there is a shortcut. For tags like <img> and <br>, where the
closing tag doesn't really signify anything, and would appear
immediately after the opening tag anyway, you can put the closing slash
inside the opening tag, so they look like this:
<img src="kitten.jpg" alt="Fluffy!" /> <br />
So that's not really too annoying.
Next we have to change our forms, anchors, and suchlike: anything that
includes a "name" attribute. For its own private reasons, XHTML wants to
change "name" to "id" across the board. So now to be XHTML-compatible we
need to use both. You know anchors, those things that specify specific
points on a page? In traditional HTML they look like this:
<a name="chapter3">Chapter Three</a>
For XHTML, we want to change them to this:
<a name="chapter3" id="chapter3">Chapter Three</a>
The rest of the job is basically about forming and maintaining good
habits. All along, in HTML, you've been supposed to include a Document
Type Declaration, indicate the character set encoding and so forth --
now in XHTML it's required. The DTD will look something like this:
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
In addition, your opening <html> tag should indicate that the document
is intended to be XML-compatible, and must indicate the language the
document is written in:
<html xml:lang="en" lang="en">
You need to declare the language twice during this difficult HTML-to-XML
transition period. ("en" means English; if you use another language,
such as "ja" or "fr," adjust your code accordingly.
And that's the gist. For details, you'll want to read over the XHTML and
XML documentation, and make sure you haven't missed any details. And so
we boldly stand with one foot in the era of HTML and with the other,
salute the rising dawn of XML. May the day arrive soon!
HINTS, POINTERS, AND TIPS 'O THE TRADE:
You can and should use an XML validator to make sure your code is
compliant. There are some available online, or you can download a
standalone program.
All this sorting-through of code is a great opportunity to tidy up your
source.
Check out our recent issue all about validating and cleaning
your code.
If you are talking on a pay phone and your friend is having trouble
hearing you, try giving the mouthpiece a sharp rap against a hard
surface. This agitates the carbon granules in the microphone and may
improve clarity.
RESOURCES:
Webmonkey's transition guide to XHTML
The W3C's XHTML spec
Differences between HTML and XHTML
STG's XML validator site
The W3C's validator
More than you need to know about XML
Even more -- from Webmonkey