CSS For Audio Media

Audio CSS has absolutely nothing to do with what a webpage looks like and everything to do with how it sounds. There is a certain device known as a screen reader, which reads out the text on the screen. These are used by people who are illiterate, dyslexic, or vision-impaired. It uses an entirely different set of properties dealing with sound.

The link attribute media="audio", the media for these properties, is officially deprecated (in other words, intended to be replaced) by the W3C, but what it will be replaced with is anyone's guess. Until that's decided, this will have to do. To gain greater insight into audio CSS, look it up on the W3C's webpage on CSS.

The Properties Of Audio CSS

As we have seen, most properties and values for CSS are self-explanitory, so I'm not going to go into great detail explaining them. All properties are listed in alphabetical order, and default values are highlighted and underlined.

azimuth
The azimuth means the side of which the voice is coming from. The voice is coming from the left refers to the azimuth. The keywords are:
  • left-side
  • far-left
  • left
  • center-left
  • center
  • center-right
  • right
  • far-right
  • right-side
  • behind
  • leftwards
  • rightwards
This can also be set by stating the degree. 0 degrees and 360 degrees are the same as center, 90 degrees is the same as right, 180 degrees is the same as behind, and 270 degrees is the same as left. Negative values are also allowed.
cue-after
cue-before
cue
The value of the cue properties must either be a URI of a sound file or the value none (which is the default). The file is played before or after the content of the element is read.
elevation
As azimuth dictates side-to-side direction, elevation dictates up-and-down direction. The keywords are:
  • below
  • level
  • above
  • higher
  • lower
It can also be an angle, as can azimuth.
pause-after
pause-before
pause
This specifies a pause before or after the content of an element is read. The measurement used here is ms, which stands for millisecond.
pitch
This specifies the average pitch of the voice. Women have voices about twice as high as men. The keywords are:
  • x-low
  • low
  • medium
  • high
  • x-high
You can also specify a frequency, using the measurement Hz, which stands for Herz.
pitch-range
This specifies the variation in average pitch. The number can be anywhere from 0 to 100. 0 is a complete monotone, 100 is an extremely animated voice.
play-during
This specifies a sound to be played while the content of an element is being read. It contains the URI of a sound file, which is followed by any of the following keywords:
mix
If mix is present, then the sound of the parent element is also played alonside the background sound for the current element. If it is not specified, then the background of the parent is replaced.
repeat
If repeat is present, the background sound is looped
auto
The background sound of the parent element continues to play.
none
There is no background sound whatsoever.
richness
This dictates how well a voice will carry, and can be any number between 0 and 100. 0 creates a very soft, mellow voice, 100 creates a very rich, strong voice.
speak-header
This dictates how often the content of header cells is read when a table is being read out.
once
The header cells are read once.
always
The content of a header cell is repeated each time before its respective data cell is read (this is where the headers attribute comes in really handy).
speak-numeral
Is 555 1234 to be read as two continuous numbers: five hundred fifty-five one thousand, two hundred thirty-four? Or are the digits to be read individually, like a Canadian phone number: five five five one two three four? This is what speak-numeral dictates. The keywords are:
  • continuous
  • digits
speak-punctuation
Sometimes, you want punctuation to be spoken, especially if it's for computer code. Other times, it's not important. The speak-punctuation dictates whether or not punctuation is spoken.
code
Speaks all puctuation literally
none
Does not speak punctuation, but uses it to break up the words into phrases.
speak
This specifies if words is to be spoken as words, spelled out as letters, or read at all. The keywords are:
  • normal
  • none
  • spell-out
speech-rate
This specifies how fast the content is to be spoken. Most people have a speech rate of around 180-200 words per minute.
x-slow
80 words per minute
slow
120 words per minute
medium
180-200 words per minute
fast
300 words per minute
x-fast
500 words per minute
slower
Slows down by 40 words per minute
faster
Speeds up by 40 words per minute
The value can also be a number, specifiying how many words per minute are spoken.
stress
This describes how much stress is put on certain syllables. While the value is a number between 0 and 100, the value for an English voice would be different than the value for a Japanese voice, which means that the lang attribute would play a role.
voice-family
This works a lot like font-family, but for voices instead of typeface. The generic voice fonts are male, female, and child.
volume
This dictates how loud the voice is. It can be any number from 0 - 100, or any of the following keywords:
  • silent
  • x-soft
  • soft
  • medium
  • loud
  • x-loud
The keyword x-soft is the same as 0, and the keywords increase by increments of 25. The keyword silent has no numerical value, and there is no sound at all.