Audio CSS has absolutely nothing to do with what a webpage looks like and everything to do with how it sounds. There is a certain device known as a screen reader, which reads out the text on the screen. These are used by people who are illiterate, dyslexic, or vision-impaired. It uses an entirely different set of properties dealing with sound.
The link
attribute media="audio"
, the media for these properties, is officially deprecated (in other words, intended to be replaced) by the W3C, but what it will be replaced with is anyone's guess. Until that's decided, this will have to do. To gain greater insight into audio CSS, look it up on the W3C's webpage on CSS.
The Properties Of Audio CSS
As we have seen, most properties and values for CSS are self-explanitory, so I'm not going to go into great detail explaining them. All properties are listed in alphabetical order, and default values are highlighted and underlined.
- azimuth
- The azimuth means the side of which the voice is coming from.
The voice is coming from the left
refers to the azimuth. The keywords are:
- left-side
- far-left
- left
- center-left
- center
- center-right
- right
- far-right
- right-side
- behind
- leftwards
- rightwards
This can also be set by stating the degree. 0 degrees and 360 degrees are the same as center
, 90 degrees is the same as right
, 180 degrees is the same as behind
, and 270 degrees is the same as left
. Negative values are also allowed.
- cue-after
- cue-before
- cue
- The value of the
cue
properties must either be a URI of a sound file or the value none
(which is the default). The file is played before or after the content of the element is read.
- elevation
- As
azimuth
dictates side-to-side direction, elevation
dictates up-and-down direction. The keywords are:
- below
- level
- above
- higher
- lower
It can also be an angle, as can azimuth
.
- pause-after
- pause-before
- pause
- This specifies a pause before or after the content of an element is read. The measurement used here is
ms
, which stands for millisecond
.
- pitch
- This specifies the average pitch of the voice. Women have voices about twice as high as men. The keywords are:
- x-low
- low
- medium
- high
- x-high
You can also specify a frequency, using the measurement Hz
, which stands for Herz
.
- pitch-range
- This specifies the variation in average pitch. The number can be anywhere from 0 to 100. 0 is a complete monotone, 100 is an extremely animated voice.
- play-during
- This specifies a sound to be played while the content of an element is being read. It contains the URI of a sound file, which is followed by any of the following keywords:
- mix
- If
mix
is present, then the sound of the parent element is also played alonside the background sound for the current element. If it is not specified, then the background of the parent is replaced.
- repeat
- If
repeat
is present, the background sound is looped
- auto
- The background sound of the parent element continues to play.
- none
- There is no background sound whatsoever.
- richness
- This dictates how well a voice will carry, and can be any number between 0 and 100. 0 creates a very soft, mellow voice, 100 creates a very rich, strong voice.
- speak-header
- This dictates how often the content of header cells is read when a table is being read out.
- once
- The header cells are read once.
- always
- The content of a header cell is repeated each time before its respective data cell is read (this is where the
headers
attribute comes in really handy).
- speak-numeral
- Is
555 1234
to be read as two continuous numbers: five hundred fifty-five one thousand, two hundred thirty-four
? Or are the digits to be read individually, like a Canadian phone number: five five five one two three four
? This is what speak-numeral
dictates. The keywords are:
- speak-punctuation
- Sometimes, you want punctuation to be spoken, especially if it's for computer code. Other times, it's not important. The
speak-punctuation
dictates whether or not punctuation is spoken.
- code
- Speaks all puctuation literally
- none
- Does not speak punctuation, but uses it to break up the words into phrases.
- speak
- This specifies if words is to be spoken as words, spelled out as letters, or read at all. The keywords are:
- speech-rate
- This specifies how fast the content is to be spoken. Most people have a speech rate of around 180-200 words per minute.
- x-slow
- 80 words per minute
- slow
- 120 words per minute
- medium
- 180-200 words per minute
- fast
- 300 words per minute
- x-fast
- 500 words per minute
- slower
- Slows down by 40 words per minute
- faster
- Speeds up by 40 words per minute
The value can also be a number, specifiying how many words per minute are spoken.
- stress
- This describes how much stress is put on certain syllables. While the value is a number between 0 and 100, the value for an English voice would be different than the value for a Japanese voice, which means that the
lang
attribute would play a role.
- voice-family
- This works a lot like
font-family
, but for voices instead of typeface. The generic voice fonts
are male
, female
, and child
.
- volume
- This dictates how loud the voice is. It can be any number from 0 - 100, or any of the following keywords:
- silent
- x-soft
- soft
- medium
- loud
- x-loud
The keyword x-soft
is the same as 0
, and the keywords increase by increments of 25. The keyword silent
has no numerical value, and there is no sound at all.