The First 256 Characters Of UTF-8

The table below shows the first 256 characters in UTF-8, which are in turn the characters of ISO-8859-1 according to their hexedecimal vlues. Control characters (which are 0016-1F16 and 7F16-9916) cannot be displayed properly, so their acronyms are shown instead.

The first 128 are the characters of ASCII, which was first published in 1963, last updated in 1986, finally surpassed as the most common encoding on the web in 2007 by UTF-8.

Below the table is a list of the the control characters—showing both their abbreviations and full names—from ASCII and ISO-8859-1. I've explained most of them, but I don't know how to explain the others without plagiarizing Wikipedia and still having no clue what the character does.

The Characters

  -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F  
  -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F  
0- NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 0-
1- DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 1-
2- ! " # $ % & ' ( ) * + , - . / 2-
3- 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 3-
4- @ A B C D E F G H I J K L M N O 4-
5- P Q R S T U V W X Y Z [ \ ] ^ _ 5-
6- ` a b c d e f g h i j k l m n o 6-
7- p q r s t u v w x y z { | } ~ DEL 7-
8- PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3 8-
9- DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST OSC PM APC 9-
A-   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ A-
B- ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿ B-
C- À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï C-
D- Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß D-
E- à á â ã ä å æ ç è é ê ë ì í î ï E-
F- ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ F-

Control Characters

Control characters are characters that send instructions to a device rather than showing up as printed characters. But, as they are characters, they can be sent right alongside printed characters, which comes in very handy in programming.

Listed below are the control characters that UTF-8 inherited from ASCII and ISO-8859-1. Some such characters date back before the invention of electronics and were used on telegraph lines. Many of these control characters are now obsolete, but because of the history of UTF-8, they're still around. Others are still used, with new meaning.

I will present two lists: the original ASCII control characters and the ISO-8859-1 characters.

ASCII Control Codes

These are the original control characters, some dating back to the dawn of telecommunications (like the Bell character). None of these should be used in a webpage, as most browsers will have no idea what to do with them.

NUL
Null Character This was originally used to allow gaps on paper tape (an early storage medium that was basically a long strip of paper with holes punched in it) so it could be edited. It also was used for padding when a terminal encountered code that might take some time to process. Today, some programming languages use this to mark the end of a string. I explained what strings are in CSS-Generated Content and Text .
SOH
Start Of Header This marks where the header of a message began.
STX
Start Of Text This marks where the text of a message begins and often where the header of a message ends.
ETX
End Of Text End of the message's text.
EOT
End Of Transmission This signalled the end of the communication session between the two devices.
ENQ
Enquiry This was a signal to a machine that basically asked if the signalled machine was available.
ACK
Acknowledge This was a signal in reply to ENQ, saying that the signalled machine was available and ready to receive messages.
BEL
Bell Character This is one of the oldest control characters, showing up in a 5-bit encoding first published in the 1870s. It would signal a machine (like a teletypewriter or a telegraph) to ring an literal, physical bell to alert the operator to an incoming message. In devices that have no bells (like a computer), this would often cause the computer to beep or a window to flash or change colours as a visual alert.
BS
Backspace Yes, hitting the backspace key does create a character. This character usually tells the computer to move the cursor back one space and delete the character that the cursor went back over.
HT
Horizontal Tabulation This was and is used to line up characters to the same horizontal position. This character is created by the TAB key on your keyboard. It is also represented in programming languages such as JavaScript by the character sequence \t
LF
Line Feed This would cause a device to go to the next line. It's still used in many programming languages such as JavaScript, where it's represented by the character sequence \n.
VT
Vertical Tabulation This works the same as horizontal tabulation, except it sets position vertically. It is not used in (X)HTML, CSS or JavaScript.
FF
Form Feed This character creates a page break. It tells a printer to go to the next page, but is rarely used today, since page breaks are created with special functions on modern equipment instead.
CR
Carriage Return This would cause a device to return to the beginning of the line. This comes from the days of typewriters (a mechanical or electromechanical device for printing for those of you who don't know). The carriage is the paper-holding assembly, and as (on older models) the printing mechanism couldn't move, the carriage did (usually going left, since most European languages are read left-to-right), taking the paper along with it. Therefore, carriage return meant returning the carriage (and the paper) to its original horizontal position, usually the far right. This is still used in several programming languages such as JavaScript, where a carriage return is represented by the character sequence \r. In the chapter on text in JavaScript, I mentioned that some browsers will a new line to consist of the sequence \r\n. This is why.
SO
Shift Out This would change to a different character set—for example, if you were writing in English, but needed to include something written in, say Russian, this would switch from a Latin character set to a Russian one.
SI
Shift In This would reverse the effects of Shift Out—that is, if you'd been writing in English, the used Shift Out to write in Russian, Shift In would switch back to the Latin character set. Some old printers also used these characters to change ink colours.
DCE
Data Link Escape This would tell cause the following code to be interpreted as raw code, much as the plaintext element (from Transitional, Obsolete, and Proprietary HTML cause the following characters to be interpreted as plain text. How to go back to interpreting the code as characters and commands depended on the implementation.
DC1
DC2
DC3
DC4
Device Control One, Two, Three, Four These characters are used for controlling perepherals (another term for a computer device) such as printers and monitors.
NAK
Negative Acknowledgement Basically, this was a character sent when there was an error in communication, asking for the information to be re-sent.
SYN
Synchronous Idle This was used to maintain synchronous connections between terminals while nothing else was being sent.
ETB
End of Transmission Block Sometimes information was sent in blocks, particularly when the data wouldn't fit in a single block. This character divided those blocks.
CAN
Cancel This character means disregard the previous information/instructions.
EM
End Of Medium Magnetic and paper tapes (both were used for storage in the early days of computing) had this character at the end of their useable portions.
Substitute
SUB This was used to substitute characters that were garbled in transmission, thus raising an error or warning. It's been put to other uses where such errors aren't a concern.
ESC
Escape This character is created by pressing the ESCAPE key (which usually has ESC on it. It can be used to quit a program, or to tell the computer that what follows are commands, not just typed characters.
FS
GS
RS
US
File, Group, Row Separator, Unit Separator These seperated various fields in data structures. What kind of fields were being seperated is in the name.
DEL
Delete This character was used to mark deleted characters on paper tape. Nowadays, it is the character created by the BACKSPACE key (not the DELETE key).

ISO-8859-1 Control Codes

ISO-8859-1 is double the size of ASCII, and it added quite a few control characters. Many of these will actually cause a character to appear, since Windows has its own variant of this character set, but it is best if you use the actual UTF-8 codes.

PAD
Padding Character
HOP
High Octet Preset
BPH
Break Permitted Here This character meant that a line break was possible (unlike the Newline character, which forced a linebreak).
NBH
No Break Here This character meant that a line break could not occur here.
IND
Index This was a command to move one full line down. It was intended to eliminate any uncertainty dealing with the Line Feed character, but has since been disused.
NEL
Next Line This is essentially Carriage Return and Line Feed rolled into one.
SSA
Start of Selected Area
ESA
End of Selected Area
HTS
Horizontal Tabulation Set (Character Tabulation Set) I mentioned that the ASCII Horizontal Tabulation character would place the cursor at the next tab stop. This character set that tab stop.
HTJ
Horizontal Tabulation with Justification (Character Tabulation with Justification)
VTS
Vertical Tabulation Set (Line Tabulation Set) This worked much like the Horizontal Tabulation set, but instead of crossways, this set a tabulation stop vertically.
PLD
PLU
Partial Line Down (Partial Line Forward)/Partial Line Up (Partial Line Backward) These two characters were used to create superscript and subscript printing.
SS2
SS3
Single Shift 2, 3
DCS
Device Control String
PU1
PU2
Private Use 1, 2 Yep, you got two characters that you could define for your own use.
STS
Set Transmit State
CCH
Cancel Character This is like the backspace character, but made it clear hat the character that was backspaced over was deleted
MW
Message Waiting
SPA
Start of Protected Area
EPA
End of Protected Area
SOS
Start Of String This character started a control string.
SGCI
Single Graphic Character Inducer
SCI
Single Character Inducer
CSI
Control Sequence Inducer
ST
String Terminator As SOS started a control string, this ended it.
OSC
Operating System Command
PM
Privacy Message
APC
Application Program Command