The table below shows the first 256 characters in UTF-8, which are in turn the characters of ISO-8859-1 according to their hexedecimal vlues. Control characters (which are 0016-1F16 and 7F16-9916) cannot be displayed properly, so their acronyms are shown instead.
The first 128 are the characters of ASCII, which was first published in 1963, last updated in 1986, finally surpassed as the most common encoding on the web in 2007 by UTF-8.
Below the table is a list of the the control characters—showing both their abbreviations and full names—from ASCII and ISO-8859-1. I've explained most of them, but I don't know how to explain the others without plagiarizing Wikipedia and still having no clue what the character does.
Control Characters
Control characters are characters that send instructions to a device rather than showing up as printed characters. But, as they are characters, they can be sent right alongside printed characters, which comes in very handy in programming.
Listed below are the control characters that UTF-8 inherited from ASCII and ISO-8859-1. Some such characters date back before the invention of electronics and were used on telegraph lines. Many of these control characters are now obsolete, but because of the history of UTF-8, they're still around. Others are still used, with new meaning.
I will present two lists: the original ASCII control characters and the ISO-8859-1 characters.
ASCII Control Codes
These are the original control characters, some dating back to the dawn of telecommunications (like the Bell character). None of these should be used in a webpage, as most browsers will have no idea what to do with them.
- NUL
- Null Character This was originally used to allow gaps on paper tape (an early storage medium that was basically a long strip of paper with holes punched in it) so it could be edited. It also was used for padding when a terminal encountered code that might take some time to process. Today, some programming languages use this to mark the end of a string. I explained what strings are in CSS-Generated Content and Text .
- SOH
- Start Of Header This marks where the header of a message began.
- STX
- Start Of Text This marks where the text of a message begins and often where the header of a message ends.
- ETX
- End Of Text End of the message's text.
- EOT
- End Of Transmission This signalled the end of the communication session between the two devices.
- ENQ
- Enquiry This was a signal to a machine that basically asked if the signalled machine was available.
- ACK
- Acknowledge This was a signal in reply to
ENQ
, saying that the signalled machine was available and ready to receive messages.
- BEL
- Bell Character This is one of the oldest control characters, showing up in a 5-bit encoding first published in the 1870s. It would signal a machine (like a teletypewriter or a telegraph) to ring an literal, physical bell to alert the operator to an incoming message. In devices that have no bells (like a computer), this would often cause the computer to beep or a window to flash or change colours as a visual alert.
- BS
- Backspace Yes, hitting the
backspace
key does create a character. This character usually tells the computer to move the cursor back one space and delete the character that the cursor went back over.
- HT
- Horizontal Tabulation This was and is used to line up characters to the same horizontal position. This character is created by the TAB key on your keyboard. It is also represented in programming languages such as JavaScript by the character sequence
\t
- LF
- Line Feed This would cause a device to go to the next line. It's still used in many programming languages such as JavaScript, where it's represented by the character sequence
\n
.
- VT
- Vertical Tabulation This works the same as horizontal tabulation, except it sets position vertically. It is not used in (X)HTML, CSS or JavaScript.
- FF
- Form Feed This character creates a page break. It tells a printer to go to the next page, but is rarely used today, since page breaks are created with special functions on modern equipment instead.
- CR
- Carriage Return This would cause a device to return to the beginning of the line. This comes from the days of typewriters (a mechanical or electromechanical device for printing for those of you who don't know). The
carriage
is the paper-holding assembly, and as (on older models) the printing mechanism couldn't move, the carriage did (usually going left, since most European languages are read left-to-right), taking the paper along with it. Therefore, carriage return
meant returning the carriage (and the paper) to its original horizontal position, usually the far right. This is still used in several programming languages such as JavaScript, where a carriage return is represented by the character sequence \r
. In the chapter on text in JavaScript, I mentioned that some browsers will a new line to consist of the sequence \r\n
. This is why.
- SO
- Shift Out This would change to a different character set—for example, if you were writing in English, but needed to include something written in, say Russian, this would switch from a Latin character set to a Russian one.
- SI
- Shift In This would reverse the effects of Shift Out—that is, if you'd been writing in English, the used Shift Out to write in Russian, Shift In would switch back to the Latin character set. Some old printers also used these characters to change ink colours.
- DCE
- Data Link Escape This would tell cause the following code to be interpreted as raw code, much as the
plaintext
element (from Transitional, Obsolete, and Proprietary HTML cause the following characters to be interpreted as plain text. How to go back to interpreting the code as characters and commands depended on the implementation.
- DC1
- DC2
- DC3
- DC4
- Device Control One, Two, Three, Four These characters are used for controlling perepherals (another term for a computer device) such as printers and monitors.
- NAK
- Negative Acknowledgement Basically, this was a character sent when there was an error in communication, asking for the information to be re-sent.
- SYN
- Synchronous Idle This was used to maintain synchronous connections between terminals while nothing else was being sent.
- ETB
- End of Transmission Block Sometimes information was sent in blocks, particularly when the data wouldn't fit in a single block. This character divided those blocks.
- CAN
- Cancel This character means
disregard the previous information/instructions
.
- EM
- End Of Medium Magnetic and paper tapes (both were used for storage in the early days of computing) had this character at the end of their useable portions.
- Substitute
- SUB This was used to substitute characters that were garbled in transmission, thus raising an error or warning. It's been put to other uses where such errors aren't a concern.
- ESC
- Escape This character is created by pressing the ESCAPE key (which usually has
ESC
on it. It can be used to quit a program, or to tell the computer that what follows are commands, not just typed characters.
- FS
- GS
- RS
- US
- File, Group, Row Separator, Unit Separator These seperated various fields in data structures. What kind of fields were being seperated is in the name.
- DEL
- Delete This character was used to mark deleted characters on paper tape. Nowadays, it is the character created by the BACKSPACE key (not the DELETE key).
ISO-8859-1 Control Codes
ISO-8859-1 is double the size of ASCII, and it added quite a few control characters. Many of these will actually cause a character to appear, since Windows has its own variant of this character set, but it is best if you use the actual UTF-8 codes.
- PAD
- Padding Character
- HOP
- High Octet Preset
- BPH
- Break Permitted Here This character meant that a line break was possible (unlike the Newline character, which forced a linebreak).
- NBH
- No Break Here This character meant that a line break could not occur here.
- IND
- Index This was a command to move one full line down. It was intended to eliminate any uncertainty dealing with the Line Feed character, but has since been disused.
- NEL
- Next Line This is essentially Carriage Return and Line Feed rolled into one.
- SSA
- Start of Selected Area
- ESA
- End of Selected Area
- HTS
- Horizontal Tabulation Set (Character Tabulation Set) I mentioned that the ASCII Horizontal Tabulation character would place the cursor at the next tab stop. This character set that tab stop.
- HTJ
- Horizontal Tabulation with Justification (Character Tabulation with Justification)
- VTS
- Vertical Tabulation Set (Line Tabulation Set) This worked much like the Horizontal Tabulation set, but instead of crossways, this set a tabulation stop vertically.
- PLD
- PLU
- Partial Line Down (Partial Line Forward)/Partial Line Up (Partial Line Backward) These two characters were used to create superscript and subscript printing.
- SS2
- SS3
- Single Shift 2, 3
- DCS
- Device Control String
- PU1
- PU2
- Private Use 1, 2 Yep, you got two characters that you could define for your own use.
- STS
- Set Transmit State
- CCH
- Cancel Character This is like the backspace character, but made it clear hat the character that was backspaced over was deleted
- MW
- Message Waiting
- SPA
- Start of Protected Area
- EPA
- End of Protected Area
- SOS
- Start Of String This character started a control string.
- SGCI
- Single Graphic Character Inducer
- SCI
- Single Character Inducer
- CSI
- Control Sequence Inducer
- ST
- String Terminator As SOS started a control string, this ended it.
- OSC
- Operating System Command
- PM
- Privacy Message
- APC
- Application Program Command