Hyperlinks

Hyperlinks And URIs

The very purpose of hyperlinks is to associate text with a Uniform Resource Identifier (URI for short), which is also known as a Uniform Resource Locator (URL for short). A URI is simply text that describes where a resource is on the Internet.

You've likely seen URIs before; they appear in the address bar of the browser where you can type in the address of the page you want to go to. What you type in is an example of an URI. In fact, you used one in your first page:

The URI in the HTML 4.01 Doctype

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

The part in bold (http://www.w3.org/TR/html4/strict.dtd) is an excellent example of a URI.

There are two types of URIs on the World Wide Web: absolute and relative.

Absolute URIs.

An absolute URI is a web address with the exact location of the resource explicitly stated. It contains the following parts (which I will demonstrate using the URI in the Doctype):

The Protocol
The Domain
The Path
The File

The Protocol

In a network, a protocol is a standardized means of communication between computers. (The Internet, of course, is the largest network in the world.) The http at the beginning of the sample URI stands for HyperText Transfer Protocol, the protocol of the World Wide Web, where webpages reside. For obvious reasons, it is the most common protocol in URIs on the World Wide Web.

If you're viewing the webpages on a CD, or viewing a webpage you've stored on your own computer, you may see the URI being with file://, file:/// or just the drive letter. This is fine; it means you're getting it from your own computer's file system, not from a webserver.

The Protocol (Highlighted)

http://www.w3.org/TR/html4/strict.dtd

The Domain

The domain is the location of the website.

The Domain (Highlighted)

http://www.w3.org/TR/html4/strict.dtd

Domains are actually read backwards by the computers that run the World Wide Web. For example, when it reads www.w3.org, the web checks the following:

If the top-level domain (in this case, .org) exists.
If the domain w3 exists within the .org top-level domain.
If the subdomain www exists within the w3.org domain.

This backwards checking is why servers are able to offer subdomains.

The Path

All websites have at least one folder that stores files and other folders, much like folders on your computer. The part of the URI that describes the sequence of folders to look in is the path.

Think of a website as a filing cabinet. The root folder (which, like the root element, stores everything else) would be the cabinet itself. And while you can store all your files in the root folder itself, it would be like having a stack of paper in a filing cabinet with no drawers: the bigger the stack (or the more files in the website), the harder it is to keep track of things.

Folders in the website would be like the drawers in that filing cabinet, and the file folders in those drawers. The path for a real-life filing cabinet may be Second drawer from the bottom, third folder from the front.

By the way, the / at the start of the path stands for the root folder, which has no name.

The Path (Highlighted)

http://www.w3.org/TR/html4/strict.dtd

Here, the path says Start in the root folder. Go to the folder named TR/. Inside that is the folder html4/. Inside that is the desired file.

The File Name

Last comes the file name, which is, of course, the file you want.

The File Name (Highlighted)

http://www.w3.org/TR/html4/strict.dtd

In this case, it's strict.dtd. You may go to the URI http://www.w3.org/TR/html4/strict.dtd if you like—it's simply the HTML 4.01 Document Type Definition.

Relative URIs

A relative URI finds files relative to the current page you are looking at. It consists of two parts:

The Path
The File Name

For this reason, it only works within the same domain. Since the URI used in the Doctype is not an actual webpage, I'll use another URI for examples: http://www.w3.org/TR/html401/about.html, which is the URI pointing to a webpage explaining the HTML specification.

The Path in a Relative URI

The paths for relative URIs use three special folder names:

./

This refers the folder containing the webpage you are looking at. Omitting this usually has no effect on the URI, but it depends on the webserver you are using. It's generally a wise idea to include it.

With the webpage I suggested, ./ would refer to http://www.w3.org/TR/html401/

../

This refers to the parent folder of the folder containing the current page. With the sample URI, this would refer to http://www.w3.org/TR/.

Should you want the parent folder of the parent folder of the current folder, you would repeat ../ like this: ../../, which would refer to http://www.w3.org/.

/

This refers to the root folder—in this case, http://www.w3.org/.

Parent and root folders are analogous to parent and root elements in an HTML document.

Errata

Some final notes on URIs.

When to use Absolute or Relative URIs

When working within the same domain, relative URIs are indispensible. With them, you can keep identical copies of your website on your server and on your own computer without having to change anything.

When linking to a resource on another domain, however, absolute URIs are the only type you can use.

Telling the difference

There is one, and only one, part of a URI that decides whether it is absolute or relative: the protocol. If it is present, the URI is treated as absolute; if absent, then relative. It is also a mistake to have the protocol twice in a single URI. Keep this in mind to avoid mistakes with hyperlinks.

URIs And Special Characters

It is possible to write out a URI using character references, which is useful when

a URI you want to link to has an character that's not normally found on the QWERTY keyboard,
you wish use a character that has a special meaning in a URI (such as /),
or you wish to obfuscate an e-mail address to hide it from spambots.

There are two ways to do this.

Character References In URIs

You simply write the code as if you were going to display it on the screen, beginning with the ampersand (&) and ending with the semicolon. But this causes things to get sticky: some URIs include the ampersand itself! In such cases, it is necessary to write out the ampersand using its character reference so the browser doesn't try and interpret the letters that follow the ampersand as a character reference—otherwise, it might do exactly that.

Percent-Encoding

You've likely seen this in the address bar of your browser from time to time, particularly the code %20, which shows up if the URI has a space in it. To figure out which characters are which is very easy: they correspond to the Unicode code points (for example, the space is U+0020)—but percent-encoding is limited to only two hexadecimal codes, which means any character above FF₁₆ (255₁₀ in decimal) cannot be used. These codes are further restricted by whether or not they are valid characters, as explained in Special Characters.

Creating A Hyperlink

There is a special element-attribute combination required for creating a hyperlink:

The a element
The href attribute

The `a` Element

To create a hyperlink, you need to use the anchor element which has the element name a. Like em and strong, a is an inline element. Unlike em and strong, you may not nest one a element inside another, although you may nest em and strong elements inside it and vice-versa.

The `href` Attribute

Above, I described URIs. This is the attribute which contains them in a hyperlink.

Usage

Technically, you can put in whatever string of text you like without raising an error while validating, but be warned: your browser will treat it like a URI, your browser will request the resource specified, and your most likely error will be the well-known 404: Page Not Found error.

A link would look like this:

A Sample Hyperlink within a paragraph:

Read about the <a href="http://www.w3.org/TR/html401/about.html">HTML specification</a>

A paragraph with a hyperlink to the W3C page

You may be wondering why I have given the element and attribute such short shrift. The reason is simple: you've seen it all before. The a element is an inline element, the href attribute requires text in a specific format to work, and that's all there is to it.

Links Off The World Wide Web

There is a question that might arise, and rightly so, about links that don't use the http:// protocol. What about, for example, e-mail? Or an internet resource that has nothing to do with the World Wide Web?

Other Protocols

Hyperlinks will indeed work with other protocols, but the usual system of folders I described above may not work with other protocols; you have to know and follow the rules of each protocol you use. Such protocols can include:

https://: This is a more secure version of the http:// protocol.
ftp://: FTP stands for File Transfer Protocol, which allows files to be uploaded to a server as well as downloaded. Like http:// and https://, URIs using this protocol require that you include a path.
furc://: This protocol is used to access content that never appears on a browser! It is the protocol used by an online roleplaying game called Furcadia, which uses its own special program. A hyperlink with this protocol can be placed on a webpage as a hyperlink which, when clicked on while the Furcadia program is running, will load the particular area of the game the hyperlink points to. Because of how this game works, this protocol eschews domains, paths and filenames altogether, and uses rules of its own instead. An example URI of this type is furc://naiagreen/, which is a link to the game's help area. The furc:// protocol and Furcadia are © Dragon's Eye Productions (<http://www.furcadia.com>) and are mentioned with permission.

A protocol always ends in ://.

When a user clicks on a link that uses a protocol that is not meant for a browser, the browser will tell the computer to launch the program with which the protocol was associated. If the browser can find no such program, the user will get warning to that effect.

Email Links

Amongst hyperlinks, the email hyperlink is something of an oddball. An e-mail address is not exactly a URI, but the hyperlink treats it like one. The syntax is in two parts and very straightforward.

mailto:: This is the first part, and it tells the browser that this is an email link.
Email address: The second part is the complete email address you want the link to point to.

The viewer's user agent will decide how to handle the link. The usual result is to launch a program associated with e-mail.

Example Hyperlinks

I included p tags as a reminder that a elements cannot be child elements of the body element—they must be contained within a block element such as p or h1.

Various Hyperlinks And Attempted Hyperlinks

<a href="http://www.w3.org/TR/html401/about.html">A link to a page about the HTML specification</a>

<a href="http://www.w3.org/TR/html4/strict.dtd">A link to the HTML Document Type Definition</a>

<a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.w3.org&charset=(detect+automatically)&doctype=Inline&group=0">A link with a URI that includes ampersands (which are underlined) in the example code. Note that I use the character entity reference for the ampersand in all cases. This URI leads you to the W3C validator which validates the W3C homepage itself.</a>

<a href="http://www.w3.org/Icons/w3c_main.png">A link to a picture (This is the W3C logo).</a>

<a href="mailto:email@domain.com">An email link with "email@domain.com" as the address.</a>

<a href="furc://naiagreen">A link that launches another program associated with the protocol "furc://" (that is, if such a program is associated with such a protocol)</a>

<a>Since this particular anchor element doesn't have a href attribute, it is not a hyperlink.</a>

Since this isn't the a element, an HTML validator will flag the use of the href attribute as an error, and result still won't be a hyperlink.

The hyperlinks generated by the above code. The last two are NOT hyperlinks.