Now that you have an idea what (X)HTML is for, let's get started on how the code actually works.
Standard Generalized Markup Language (or SGML) is a language for creating markup languages, two of which are HTML and XHTML. Since many other markup languages (you can find a list of them in the back of the book) are also based on SGML, what is taught in this chapter will serve you well in figuring out these other markup languages.
Couple of notes here. First. any word in bold that I did not mention earlier I will define shortly, so bear with me a bit. Second: I'm afraid this is all abstract; nothing in this chapter really does anything. That's for later. :-)
An element consists of—in order—its start tag, its content, and its end tag. To compare it to a book, the start tag is the front cover, the content is the pages, the end tag is the back cover, and the book as a whole is the element.
Content can be just about anything. Most often it is displayed as text, although elements can have other content as well. Below is an example of what I'm talking about: The content is in green bold, the tags are in blue italics, and the element itself is underlined:
As I mentioned before, tags mark the boundaries of elements and there are two major types of tags: start tags (which go at the start of their respective elements) and end tags (which go at the end).
A tag is made of several parts: two or three specific symbols and a name.
Three symbols hold special importance in any SGML-derived markup language: <
, >
, and /
.
<, like this:
</. It also has an extra use in XML (and thus XHTML), but I'll get to that later.
It's as simple as that.
The name of a tag (also known as the element name or element type) always starts with a letter, and can have one or more letters, zero or more numbers, and even underscores and hyphens (though no element in (X)HTML uses the last two). It identifies its element; therefore it is always the first piece of information contained in the tag. If the element name in a start tag and end tag do not match, then they belong to different elements—even if that is not your intent. The name always directly follows <
in start tags and </
in end tags.
I'll use an element called html
to illustrate the anatomy of a pair of tags:
It is important to note that <
, the tag name, and >
are all equally important. Omit any of these, and you'll run into trouble. The table below shows what happens if you omit them.
Omit | Result | |
---|---|---|
HTML | XML | |
< | The tagwill not be read as a tag. You will get the element name followed by >symbol. | |
Element Name | If the element name is omitted from an end tag—that is, if you type </>—the browser will try to read it as a tag, and therefore display nothing at all. If the element name is omitted from a start tag, then The browser will not read it as such, but instead display <>. |
The browser will display a Not well formederror, indicating that the tag was not coded properly, and the browser cannot read it. |
> | You'll likely end up with missing content as the browser tries to figure out where the tag ends—usually choosing the next >to be the ending of that tag (which will actually belong to a different tag). |
Most elements can contain other elements; putting one element inside another is known as nesting. A good illustration is setting a box inside a larger box (disregarding height for simplicity's sake). Like boxes, one element must fit completely inside another—that is, the inside element must end before the outside element ends. The following examples are correct:
The result can be illustrated something like this:
Again, an illustration:
The following example is incorrect. Notice the order of start tags and end tags:
Notice that element2
is partially inside and partially outside element1
. Below is an illustration of a possible result:
Going back to the box analogy, if you have two boxes so nested
, you're doing it wrong (and if you've managed to do this without sawing cardboard, you've pulled off a really neat trick). Notice that I said a possible result
; such nesting forces the browser to guess at what you want, and if browsers guess, at least one is going to guess incorrectly.
Also, if you do this in an XML document (for example, an XHTML document) the browser won't even try to guess what you want; it will simply display an error message citing mismatched start and end tags.
If you want element2
both inside and outside element1
or vice versa, you'll need two separate instances of element1
or element2
, as displayed in the following examples of correct nesting shown above.
Yes, it is quite acceptable to have two or more instances of the same element in a markup document. Exceptions to this rule depend on the markup language and are usually uncommon (for example, there are only five single-use elements in (X)HTML).
Which element is nested in which is known as element ancestry. To demonstrate ancestry, below is a group of common (X)HTML elements.
Below is a diagram of the way they're nested:
There are 6 terms dealing with element ancestry:
parent elementis a more specific term: An element's parent element is the one in which it is immediately nested.
child elementis a more specific term: an element's child element is one that is immediately nested within it.
If the last five terms terms sound like terms used in a family tree, that is the analogy on which they were based. To clarify further, below is an explanation of which elements are which.
html
is the root element of the document.
The following elements are ancestor elements:
html
is the root element, and therefore ancestor of all others.head
is the ancestor element of:
body
is the ancestor element of:
p
is the ancestor element of:
If an element has no elements contained within, it is not an ancestor element. In this case, therefore, the following elements are not ancestor elements:
html
is the parent element of:
head
is the parent element of:
body
is the parent element of:
p
is the parent element of:
The following list shows elements that are ancestors, but not parents of other elements.
html
is the ancestor, but not parent, element of:
body
is the ancestor, but not parent, element of:
To reïterate what I said earlier, a parent element is an ancestor element; therefore all elements that are listed as not being ancestor elements are not parent elements either.
The following elements are sibling elements:
head
, body
h1
, p
em
, strong
The following elements have no sibings:
By definition, a root element cannot have siblings (since it cannot have a parent), and the title
element is the only child element of the head
element.
strong
and em
are child elements of p
. They are also descendent elements of:p
and h1
are child elements of body
. They are also descendant elements of html
title
is the child element of head
. It is also the descendant element of:body
and head
are child elements and descendant elements of html
Because it is a root element html
is not descendant (and thus not child) element.
The Document Type Declaration (Doctype for short) is a piece of text placed at the beginning of a markup document that points the browser to a file containing information about the markup language being used. This file (known as a Document Type Definition, or DTD) contains the language's rules and codes for special characters. These codes are explained in Special Characters.
This is important; (X)HTML is only one of literally thousands of markup languages used on the Internet and one of several that are officially recognized and useable by many browsers. (X)HTML itself has at least half a dozen versions, and if you don't tell the browser precisely which markup language is being used, the browser will try to guess, which is bad.
The Doctype is placed outside the root element; it is no more an element than a notice that a book is written in English is a chapter. However, the Doctype IS a part of the document, just as such a notice would be a part of the book. Like the root element, each markup document gets exactly one Doctype. If you have no Doctype, the browser must guess at the language being used; more than one and the browser will be confused. Doctype also have a special character sequence at their start: While most tags have <
, Doctype use <!
, though they still close with >
. As it is not an element, a Doctype does not use an end tag.
Two excellent examples of Doctypes are the HTML 4.01 Strict Doctype—which you will see often in this book and is the gold standard for HTML coding these days—and the XHTML 1.0 Strict Doctype.
Don't worry about how the Doctype works; that's hairy hocus pocus that we don't need to get into. All that's important is what it does: define the markup language being used. Of course, if you really want to know, you can always look at Picking Apart The Doctype
Also, never confuse a Doctype with a DTD. The latter is a Document Type Definition, an online document that lays out the rules for a markup language. It's quite a different beast altogether.
Documents written using an XML-based language (and this includes those using XHTML) include another declaration, one that goes before even the Doctype. This is known as the XML Declaration. The simplest simply states which version of XML the page is in and which character encoding the document is using. This declaration looks like this:
Of course, to edit a webpage or any markup document, you will need a suitable program, and any text editor will do so long as it can save a file as plain text. Such programs include Notepad (for Windows), Edit (for DOS), or whatever comes with your computer.
It is unwise to use What-You-See-Is-What-You-Get (WYSIWYG) editors; they are expensive, can have so many bells and whistles you're not sure where to begin, often generate overly complicated code, and otherwise get in the way of learning.
Also, please—please!—do not use a word processor to create a webpage! Doing so causes a couple of serious problems:
Word processors read one type of code (it varies from processor to processor), browsers use another ((X)HTML). A webpage created by a word processor, therefore, has both. For example, in Your First Webpage, I walk you through creating a simple webpage. Out of morbid curiosity, I created that same page in a popular word processor to see what the file size of that would be. The difference is drastic:
Text Editor | Word Processor |
---|---|
357 Bytes | 3,692 bytes |
If you have to pay for the bandwidth a page uses, this really adds up in a hurry.
you must never confuse a browser or let it guess at what you want.The code a word processor spits out is as confusing as it gets, resulting in inconsistent rendering. This can range from a webpage's design simply not appearing as it should to such serious consequences as hyperlinks (a very important part of webpages) not working at all (I've seen this happen).
For the website creator, a text editor is sufficient for all coding and, what is better, free.